all AI news
The Devil is in the Details: On the Pitfalls of Vocabulary Selection in Neural Machine Translation. (arXiv:2205.06618v1 [cs.CL])
cs.LG updates on arXiv.org arxiv.org
Vocabulary selection, or lexical shortlisting, is a well-known technique to
improve latency of Neural Machine Translation models by constraining the set of
allowed output words during inference. The chosen set is typically determined
by separately trained alignment model parameters, independent of the
source-sentence context at inference time. While vocabulary selection appears
competitive with respect to automatic quality metrics in prior work, we show
that it can fail to select the right set of output words, particularly for
semantically non-compositional linguistic …
arxiv machine machine translation neural machine translation translation