Web: http://arxiv.org/abs/2209.11035

Sept. 23, 2022, 1:16 a.m. | Hugo Abonizio, Leandro Rodrigues de Souza, Roberto Lotufo, Rodrigo Nogueira

cs.CL updates on arXiv.org arxiv.org

The zero-shot cross-lingual ability of models pretrained on multilingual and
even monolingual corpora has spurred many hypotheses to explain this intriguing
empirical result. However, due to the costs of pretraining, most research uses
public models whose pretraining methodology, such as the choice of
tokenization, corpus size, and computational budget, might differ drastically.
When researchers pretrain their own models, they often do so under a
constrained budget, and the resulting models might underperform significantly
compared to SOTA models. These experimental differences

arxiv language language models

