Web: http://arxiv.org/abs/2206.07238

June 16, 2022, 1:12 a.m. | Mukhlis Amien, Chong Feng, Heyan Huang

cs.CL updates on arXiv.org arxiv.org

Twitter contains an abundance of linguistic data from the real world. We
examine Twitter for user-generated content in low-resource languages such as
local Indonesian. For NLP to work in Indonesian, it must consider local
dialects, geographic context, and regional culture influence Indonesian
languages. This paper identifies the problems we faced when constructing a
Local Indonesian NLP dataset. Furthermore, we are developing a framework for
creating, collecting, and classifying Local Indonesian datasets for NLP. Using
twitter's geolocation tool for automatic annotating.

arxiv datasets filtering language twitter

More from arxiv.org / cs.CL updates on arXiv.org

Machine Learning Researcher - Saalfeld Lab

@ Howard Hughes Medical Institute - Chevy Chase, MD | Ashburn, Virginia

Project Director, Machine Learning in US Health

@ ideas42.org | Remote, US

Data Science Intern

@ NannyML | Remote

Machine Learning Engineer NLP/Speech

@ Play.ht | Remote

Research Scientist, 3D Reconstruction

@ Yembo | Remote, US

Clinical Assistant or Associate Professor of Management Science and Systems

@ University at Buffalo | Buffalo, NY