March 28, 2024, 4:48 a.m. | Amir Hossein Kargaran, Fran\c{c}ois Yvon, Hinrich Sch\"utze

cs.CL updates on arXiv.org arxiv.org

arXiv:2309.13320v2 Announce Type: replace
Abstract: We present GlotScript, an open resource and tool for low resource writing system identification. GlotScript-R is a resource that provides the attested writing systems for more than 7,000 languages. It is compiled by aggregating information from existing writing system resources. GlotScript-T is a writing system identification tool that covers all 161 Unicode 15.0 scripts. For an input text, it returns its script distribution where scripts are identified by ISO 15924 codes. We also present two …

arxiv cs.cl identification low tool type writing

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior ML Engineer

@ Carousell Group | Ho Chi Minh City, Vietnam

Data and Insight Analyst

@ Cotiviti | Remote, United States