Aug. 3, 2022, 1:10 a.m. | Saleh Soltan, Shankar Ananthakrishnan, Jack FitzGerald, Rahul Gupta, Wael Hamza, Haidar Khan, Charith Peris, Stephen Rawls, Andy Rosenbaum, Anna Rumsh

cs.LG updates on arXiv.org arxiv.org

In this work, we demonstrate that multilingual large-scale
sequence-to-sequence (seq2seq) models, pre-trained on a mixture of denoising
and Causal Language Modeling (CLM) tasks, are more efficient few-shot learners
than decoder-only models on various tasks. In particular, we train a 20 billion
parameter multilingual seq2seq model called Alexa Teacher Model (AlexaTM 20B)
and show that it achieves state-of-the-art (SOTA) performance on 1-shot
summarization tasks, outperforming a much larger 540B PaLM decoder model.
AlexaTM 20B also achieves SOTA in 1-shot machine translation, …

arxiv few-shot learning learning scale seq2seq

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Sr. Software Development Manager, AWS Neuron Machine Learning Distributed Training

@ Amazon.com | Cupertino, California, USA