all AI news
IndicNLG Benchmark: Multilingual Datasets for Diverse NLG Tasks in Indic Languages. (arXiv:2203.05437v2 [cs.CL] UPDATED)
Oct. 28, 2022, 1:16 a.m. | Aman Kumar, Himani Shrotriya, Prachi Sahu, Raj Dabre, Ratish Puduppully, Anoop Kunchukuttan, Amogh Mishra, Mitesh M. Khapra, Pratyush Kumar
cs.CL updates on arXiv.org arxiv.org
Natural Language Generation (NLG) for non-English languages is hampered by
the scarcity of datasets in these languages. In this paper, we present the
IndicNLG Benchmark, a collection of datasets for benchmarking NLG for 11 Indic
languages. We focus on five diverse tasks, namely, biography generation using
Wikipedia infoboxes, news headline generation, sentence summarization,
paraphrase generation and, question generation. We describe the created
datasets and use them to benchmark the performance of several monolingual and
multilingual baselines that leverage pre-trained sequence-to-sequence …
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne