all AI news
Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck
April 12, 2024, 4:47 a.m. | Nathan Godey, \'Eric de la Clergerie, Beno\^it Sagot
cs.CL updates on arXiv.org arxiv.org
Abstract: Recent advances in language modeling consist in pretraining highly parameterized neural networks on extremely large web-mined text corpora. Training and inference with such models can be costly in practice, which incentivizes the use of smaller counterparts. However, it has been observed that smaller models can suffer from saturation, characterized as a drop in performance at some advanced point in training followed by a plateau. In this paper, we find that such saturation can be explained …
abstract advances arxiv cs.cl however inference language language model language models modeling networks neural networks practice pretraining small small language models softmax studying text training type via web
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Lead Data Scientist, Commercial Analytics
@ Checkout.com | London, United Kingdom
Data Engineer I
@ Love's Travel Stops | Oklahoma City, OK, US, 73120