all AI news
GTrans: Grouping and Fusing Transformer Layers for Neural Machine Translation. (arXiv:2207.14467v2 [cs.CL] UPDATED)
cs.LG updates on arXiv.org arxiv.org
Transformer structure, stacked by a sequence of encoder and decoder network
layers, achieves significant development in neural machine translation.
However, vanilla Transformer mainly exploits the top-layer representation,
assuming the lower layers provide trivial or redundant information and thus
ignoring the bottom-layer feature that is potentially valuable. In this work,
we propose the Group-Transformer model (GTrans) that flexibly divides
multi-layer representations of both encoder and decoder into different groups
and then fuses these group features to generate target words. To corroborate …
arxiv machine machine translation neural machine translation transformer translation