all AI news
Do Pretrained Contextual Language Models Distinguish between Hebrew Homograph Analyses?
May 14, 2024, 4:49 a.m. | Avi Shmidman, Cheyn Shmuel Shmidman, Dan Bareket, Moshe Koppel, Reut Tsarfaty
cs.CL updates on arXiv.org arxiv.org
Abstract: Semitic morphologically-rich languages (MRLs) are characterized by extreme word ambiguity. Because most vowels are omitted in standard texts, many of the words are homographs with multiple possible analyses, each with a different pronunciation and different morphosyntactic properties. This ambiguity goes beyond word-sense disambiguation (WSD), and may include token segmentation into multiple word units. Previous research on MRLs claimed that standardly trained pre-trained language models (PLMs) based on word-pieces may not sufficiently capture the internal structure …
abstract arxiv beyond cs.cl language language models languages multiple sense standard type word words
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Senior Machine Learning Engineer
@ GPTZero | Toronto, Canada
Sr. Data Operations
@ Carousell Group | West Jakarta, Indonesia
Senior Analyst, Business Intelligence & Reporting
@ Deutsche Bank | Bucharest
Business Intelligence Subject Matter Expert (SME) - Assistant Vice President
@ Deutsche Bank | Cary, 3000 CentreGreen Way
Enterprise Business Intelligence Specialist
@ NAIC | Kansas City
Senior Business Intelligence (BI) Developer - Associate
@ Deutsche Bank | Cary, 3000 CentreGreen Way