April 10, 2024, 4:47 a.m. | Blnd Yaseen, Hossein Hassani

cs.CL updates on arXiv.org arxiv.org

arXiv:2404.06101v1 Announce Type: new
Abstract: Kurdish libraries have many historical publications that were printed back in the early days when printing devices were brought to Kurdistan. Having a good Optical Character Recognition (OCR) to help process these publications and contribute to the Kurdish languages resources which is crucial as Kurdish is considered a low-resource language. Current OCR systems are unable to extract text from historical documents as they have many issues, including being damaged, very fragile, having many marks left …

abstract arxiv character recognition cs.cl devices good languages libraries making ocr optical optical character recognition printing process publications recognition resources type

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Software Engineer, Data Tools - Full Stack

@ DoorDash | Pune, India

Senior Data Analyst

@ Artsy | New York City