Web: http://arxiv.org/abs/2206.07808

June 17, 2022, 1:12 a.m. | Jack FitzGerald, Shankar Ananthakrishnan, Konstantine Arkoudas, Davide Bernardi, Abhishek Bhagia, Claudio Delli Bovi, Jin Cao, Rakesh Chada, Amit Chau

cs.CL updates on arXiv.org arxiv.org

We present results from a large-scale experiment on pretraining encoders with
non-embedding parameter counts ranging from 700M to 9.3B, their subsequent
distillation into smaller models ranging from 17M-170M parameters, and their
application to the Natural Language Understanding (NLU) component of a virtual
assistant system. Though we train using 70% spoken-form data, our teacher
models perform comparably to XLM-R and mT5 when evaluated on the written-form
Cross-lingual Natural Language Inference (XNLI) corpus. We perform a second
stage of pretraining on our …

alexa arxiv language model natural natural language systems

More from arxiv.org / cs.CL updates on arXiv.org

Machine Learning Researcher - Saalfeld Lab

@ Howard Hughes Medical Institute - Chevy Chase, MD | Ashburn, Virginia

Project Director, Machine Learning in US Health

@ ideas42.org | Remote, US

Data Science Intern

@ NannyML | Remote

Machine Learning Engineer NLP/Speech

@ Play.ht | Remote

Research Scientist, 3D Reconstruction

@ Yembo | Remote, US

Clinical Assistant or Associate Professor of Management Science and Systems

@ University at Buffalo | Buffalo, NY