Web: http://arxiv.org/abs/2206.07238

June 16, 2022, 1:12 a.m. | Mukhlis Amien, Chong Feng, Heyan Huang

cs.CL updates on arXiv.org arxiv.org

Twitter contains an abundance of linguistic data from the real world. We
examine Twitter for user-generated content in low-resource languages such as
local Indonesian. For NLP to work in Indonesian, it must consider local
dialects, geographic context, and regional culture influence Indonesian
languages. This paper identifies the problems we faced when constructing a
Local Indonesian NLP dataset. Furthermore, we are developing a framework for
creating, collecting, and classifying Local Indonesian datasets for NLP. Using
twitter's geolocation tool for automatic annotating.

arxiv datasets filtering language twitter

