Nov. 8, 2022, 2:16 a.m. | Jian Yang, Yuwei Yin, Liqun Yang, Shuming Ma, Haoyang Huang, Dongdong Zhang, Furu Wei, Zhoujun Li

cs.CL updates on arXiv.org arxiv.org

Transformer structure, stacked by a sequence of encoder and decoder network
layers, achieves significant development in neural machine translation.
However, vanilla Transformer mainly exploits the top-layer representation,
assuming the lower layers provide trivial or redundant information and thus
ignoring the bottom-layer feature that is potentially valuable. In this work,
we propose the Group-Transformer model (GTrans) that flexibly divides
multi-layer representations of both encoder and decoder into different groups
and then fuses these group features to generate target words. To corroborate …

arxiv machine machine translation neural machine translation transformer translation

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Software Engineer, Generative AI (C++)

@ SoundHound Inc. | Toronto, Canada