Making Old Kurdish Publications Processable by Augmenting Available Optical Character Recognition Engines | allainews.com

April 10, 2024, 4:47 a.m. | Blnd Yaseen, Hossein Hassani

cs.CL updates on arXiv.org arxiv.org

arXiv:2404.06101v1 Announce Type: new
Abstract: Kurdish libraries have many historical publications that were printed back in the early days when printing devices were brought to Kurdistan. Having a good Optical Character Recognition (OCR) to help process these publications and contribute to the Kurdish languages resources which is crucial as Kurdish is considered a low-resource language. Current OCR systems are unable to extract text from historical documents as they have many issues, including being damaged, very fragile, having many marks left …

abstract arxiv character recognition cs.cl devices good languages libraries making ocr optical optical character recognition printing process publications recognition resources type

More from arxiv.org / cs.CL updates on arXiv.org

Multi-label Text Classification using GloVe and Neural Network Models 22 hours ago | arxiv.org

abstract arxiv challenges classification +21

CFBenchmark: Chinese Financial Assistant Benchmark for Large Language Model 22 hours ago | arxiv.org

arxiv assistant benchmark chinese +8

Leveraging text data for causal inference using electronic health records 22 hours ago | arxiv.org

abstract arxiv causal causal inference +22

How do languages influence each other? Studying cross-lingual data sharing during LM fine-tuning 22 hours ago | arxiv.org

abstract arxiv benefit cross-lingual +20

Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances 22 hours ago | arxiv.org

arxiv clustering cs.ai cs.cl +6

RecGPT: Generative Pre-training for Text-based Recommendation 22 hours ago | arxiv.org

arxiv cs.cl cs.ir generative +5

From Human-to-Human to Human-to-Bot Conversations in Software Engineering 22 hours ago | arxiv.org

abstract aim arxiv bot +21

ProtT3: Protein-to-Text Generation for Text-based Protein Understanding 22 hours ago | arxiv.org

arxiv cs.cl cs.mm protein +5

CoCo Matrix: Taxonomy of Cognitive Contributions in Co-writing with Intelligent Agents 22 hours ago | arxiv.org

abstract agents arxiv coco +14

Seeking Developers and Engineers for AI T-Shirt Generator Project

@ Chevon Hicks | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

GCP Data Engineer

@ Avant Digital | Delhi, DL, India

View on ai-jobs.net