Feb. 8, 2023, 7:37 p.m. | Allen Institute for AI

Allen Institute for AI www.youtube.com

No Language Left Behind Unlocking Text Data for Under Resourced
Shruti Rijhwani

NLP systems are limited by the availability of text data, and because machine-readable text exists only in a few hundred languages, most of the world’s languages are under-represented in modern language technologies.
Text data exists in many more languages! However, it is locked away in printed books and handwritten documents, and training a high-performance optical character recognition (OCR) system to extract the text is challenging for most under-resourced …

ai2 books character recognition data extract language languages machine nlp nlp systems ocr optical character recognition performance systems talk technologies text training world

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Analyst - Associate

@ JPMorgan Chase & Co. | Mumbai, Maharashtra, India

Staff Data Engineer (Data Platform)

@ Coupang | Seoul, South Korea

AI/ML Engineering Research Internship

@ Keysight Technologies | Santa Rosa, CA, United States

Sr. Director, Head of Data Management and Reporting Execution

@ Biogen | Cambridge, MA, United States

Manager, Marketing - Audience Intelligence (Senior Data Analyst)

@ Delivery Hero | Singapore, Singapore