all AI news
Making Old Kurdish Publications Processable by Augmenting Available Optical Character Recognition Engines
April 10, 2024, 4:47 a.m. | Blnd Yaseen, Hossein Hassani
cs.CL updates on arXiv.org arxiv.org
Abstract: Kurdish libraries have many historical publications that were printed back in the early days when printing devices were brought to Kurdistan. Having a good Optical Character Recognition (OCR) to help process these publications and contribute to the Kurdish languages resources which is crucial as Kurdish is considered a low-resource language. Current OCR systems are unable to extract text from historical documents as they have many issues, including being damaged, very fragile, having many marks left …
abstract arxiv character recognition cs.cl devices good languages libraries making ocr optical optical character recognition printing process publications recognition resources type
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Software Engineer, Data Tools - Full Stack
@ DoorDash | Pune, India
Senior Data Analyst
@ Artsy | New York City