Nov. 23, 2022, 2:13 a.m. | Ananya Joshi, Aditi Kajale, Janhavi Gadre, Samruddhi Deode, Raviraj Joshi

cs.LG updates on arXiv.org arxiv.org

Sentence representation from vanilla BERT models does not work well on
sentence similarity tasks. Sentence-BERT models specifically trained on STS or
NLI datasets are shown to provide state-of-the-art performance. However,
building these models for low-resource languages is not straightforward due to
the lack of these specialized datasets. This work focuses on two low-resource
Indian languages, Hindi and Marathi. We train sentence-BERT models for these
languages using synthetic NLI and STS datasets prepared using machine
translation. We show that the strategy …

arxiv benchmarking bert hindi

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

DevOps Engineer (Data Team)

@ Reward Gateway | Sofia/Plovdiv