all AI news
L3Cube-MahaSBERT and HindSBERT: Sentence BERT Models and Benchmarking BERT Sentence Representations for Hindi and Marathi. (arXiv:2211.11187v2 [cs.CL] UPDATED)
Nov. 23, 2022, 2:17 a.m. | Ananya Joshi, Aditi Kajale, Janhavi Gadre, Samruddhi Deode, Raviraj Joshi
cs.CL updates on arXiv.org arxiv.org
Sentence representation from vanilla BERT models does not work well on
sentence similarity tasks. Sentence-BERT models specifically trained on STS or
NLI datasets are shown to provide state-of-the-art performance. However,
building these models for low-resource languages is not straightforward due to
the lack of these specialized datasets. This work focuses on two low-resource
Indian languages, Hindi and Marathi. We train sentence-BERT models for these
languages using synthetic NLI and STS datasets prepared using machine
translation. We show that the strategy …
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Machine Learning Research Scientist
@ d-Matrix | San Diego, Ca