all AI news
Upper and lower memory capacity bounds of transformers for next-token prediction
May 24, 2024, 4:42 a.m. | Liam Madden, Curtis Fox, Christos Thrampoulidis
cs.LG updates on arXiv.org arxiv.org
Abstract: Given a sequence of tokens, such as words, the task of next-token prediction is to predict the next-token conditional probability distribution. Decoder-only transformers have become effective models for this task, but their properties are still not fully understood. In particular, the largest number of distinct context sequences that a decoder-only transformer can interpolate next-token distributions for has not been established. To fill this gap, we prove upper and lower bounds on this number, which are …
abstract arxiv become capacity cs.lg decoder distribution math.oc memory next prediction probability token tokens transformers type words
More from arxiv.org / cs.LG updates on arXiv.org
Machine-learned models for magnetic materials
2 days, 13 hours ago |
arxiv.org
Revisiting RIP guarantees for sketching operators on mixture models
2 days, 13 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Senior Data Engineer
@ Displate | Warsaw
Junior Data Analyst - ESG Data
@ Institutional Shareholder Services | Mumbai
Intern Data Driven Development in Sensor Fusion for Autonomous Driving (f/m/x)
@ BMW Group | Munich, DE
Senior MLOps Engineer, Machine Learning Platform
@ GetYourGuide | Berlin
Data Engineer, Analytics
@ Meta | Menlo Park, CA
Data Engineer
@ Meta | Menlo Park, CA