all AI news
CrossSum: Beyond English-Centric Cross-Lingual Abstractive Text Summarization for 1500+ Language Pairs. (arXiv:2112.08804v2 [cs.CL] UPDATED)
cs.CL updates on arXiv.org arxiv.org
We present CrossSum, a large-scale cross-lingual abstractive summarization
dataset comprising 1.7 million article-summary samples in 1500+ language pairs.
We create CrossSum by aligning identical articles written in different
languages via cross-lingual retrieval from a multilingual summarization
dataset. We propose a multi-stage data sampling algorithm to effectively train
a cross-lingual summarization model capable of summarizing an article in any
target language. We also propose LaSE, a new metric for automatically
evaluating model-generated summaries and showing a strong correlation with
ROUGE. Performance …
arxiv cross-lingual language summarization text text summarization