Nov. 3, 2022, 1:12 a.m. | Michalis Korakakis, Andreas Vlachos

cs.LG updates on arXiv.org arxiv.org

Despite strong performance in many sequence-to-sequence tasks, autoregressive
models trained with maximum likelihood estimation suffer from exposure bias,
i.e. the discrepancy between the ground-truth prefixes used during training and
the model-generated prefixes used at inference time. Scheduled sampling is a
simple and empirically successful approach which addresses this issue by
incorporating model-generated prefixes into training. However, it has been
argued that it is an inconsistent training objective leading to models ignoring
the prefixes altogether. In this paper, we conduct systematic …

arxiv consolidation machine machine translation neural machine translation sampling translation

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote