all AI news
Annotating the Tweebank Corpus on Named Entity Recognition and Building NLP Models for Social Media Analysis. (arXiv:2201.07281v1 [cs.CL])
Jan. 20, 2022, 2:10 a.m. | Hang Jiang, Yining Hua, Doug Beeferman, Deb Roy
cs.CL updates on arXiv.org arxiv.org
Social media data such as Twitter messages ("tweets") pose a particular
challenge to NLP systems because of their short, noisy, and colloquial nature.
Tasks such as Named Entity Recognition (NER) and syntactic parsing require
highly domain-matched training data for good performance. While there are some
publicly available annotated datasets of tweets, they are all purpose-built for
solving one task at a time. As yet there is no complete training corpus for
both syntactic analysis (e.g., part of speech tagging, dependency …
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Principal Machine Learning Engineer (AI, NLP, LLM, Generative AI)
@ Palo Alto Networks | Santa Clara, CA, United States
Consultant Senior Data Engineer F/H
@ Devoteam | Nantes, France