May 25, 2022, 1:12 a.m. | Abhik Bhattacharjee, Tahmid Hasan, Wasi Uddin Ahmad, Yuan-Fang Li, Yong-Bin Kang, Rifat Shahriyar

cs.CL updates on arXiv.org arxiv.org

We present CrossSum, a large-scale cross-lingual abstractive summarization
dataset comprising 1.7 million article-summary samples in 1500+ language pairs.
We create CrossSum by aligning identical articles written in different
languages via cross-lingual retrieval from a multilingual summarization
dataset. We propose a multi-stage data sampling algorithm to effectively train
a cross-lingual summarization model capable of summarizing an article in any
target language. We also propose LaSE, a new metric for automatically
evaluating model-generated summaries and showing a strong correlation with
ROUGE. Performance …

arxiv cross-lingual language summarization text text summarization

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Business Intelligence Analyst

@ Rappi | COL-Bogotá

Applied Scientist II

@ Microsoft | Redmond, Washington, United States