Upper and lower memory capacity bounds of transformers for next-token prediction | allainews.com

May 24, 2024, 4:42 a.m. | Liam Madden, Curtis Fox, Christos Thrampoulidis

cs.LG updates on arXiv.org arxiv.org

arXiv:2405.13718v1 Announce Type: new
Abstract: Given a sequence of tokens, such as words, the task of next-token prediction is to predict the next-token conditional probability distribution. Decoder-only transformers have become effective models for this task, but their properties are still not fully understood. In particular, the largest number of distinct context sequences that a decoder-only transformer can interpolate next-token distributions for has not been established. To fill this gap, we prove upper and lower bounds on this number, which are …

abstract arxiv become capacity cs.lg decoder distribution math.oc memory next prediction probability token tokens transformers type words

More from arxiv.org / cs.LG updates on arXiv.org

Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior 2 days, 13 hours ago | arxiv.org

arxiv consistent cs.cv cs.lg +6

Machine-learned models for magnetic materials 2 days, 13 hours ago | arxiv.org

abstract arxiv autoencoder cond-mat.mtrl-sci +17

Revisiting RIP guarantees for sketching operators on mixture models 2 days, 13 hours ago | arxiv.org

abstract alternative analysis arxiv +9

Non-Intrusive Speech Intelligibility Prediction for Hearing Aids using Whisper and Metadata 2 days, 13 hours ago | arxiv.org

abstract accuracy arxiv assessment +16

Getting More for Less: Using Weak Labels and AV-Mixup for Robust Audio-Visual Speaker Verification 2 days, 13 hours ago | arxiv.org

abstract arxiv audio cs.cv +18

Neural-network quantum state study of the long-range antiferromagnetic Ising chain 2 days, 13 hours ago | arxiv.org

abstract arxiv boltzmann cond-mat.quant-gas +12

Prediction Risk and Estimation Risk of the Ridgeless Least Squares Estimator under General Assumptions on … 2 days, 13 hours ago | arxiv.org

abstract arxiv assumptions cs.lg +22

Vortex Feature Positioning: Bridging Tabular IIoT Data and Image-Based Deep Learning 2 days, 13 hours ago | arxiv.org

abstract arxiv cs.cv cs.lg +19

Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret 2 days, 13 hours ago | arxiv.org

abstract algorithms arxiv attention +20

Senior Data Engineer

@ Displate | Warsaw

View on ai-jobs.net

Junior Data Analyst - ESG Data

@ Institutional Shareholder Services | Mumbai

View on ai-jobs.net

Intern Data Driven Development in Sensor Fusion for Autonomous Driving (f/m/x)

@ BMW Group | Munich, DE

View on ai-jobs.net

Senior MLOps Engineer, Machine Learning Platform

@ GetYourGuide | Berlin

View on ai-jobs.net

Data Engineer, Analytics

@ Meta | Menlo Park, CA

View on ai-jobs.net

Data Engineer

@ Meta | Menlo Park, CA

View on ai-jobs.net