Upper and lower memory capacity bounds of transformers for next-token prediction | allainews.com

May 24, 2024, 4:42 a.m. | Liam Madden, Curtis Fox, Christos Thrampoulidis

cs.LG updates on arXiv.org arxiv.org

arXiv:2405.13718v1 Announce Type: new
Abstract: Given a sequence of tokens, such as words, the task of next-token prediction is to predict the next-token conditional probability distribution. Decoder-only transformers have become effective models for this task, but their properties are still not fully understood. In particular, the largest number of distinct context sequences that a decoder-only transformer can interpolate next-token distributions for has not been established. To fill this gap, we prove upper and lower bounds on this number, which are …

abstract arxiv become capacity cs.lg decoder distribution math.oc memory next prediction probability token tokens transformers type words

More from arxiv.org / cs.LG updates on arXiv.org

Is Temperature Sample Efficient for Softmax Gaussian Mixture of Experts? 22 hours ago | arxiv.org

abstract alternative arxiv become +11

Visual Explanations of Image-Text Representations via Multi-Modal Information Bottleneck Attribution 22 hours ago | arxiv.org

abstract application arxiv attribution +24

Intrinsic LoRA: A Generalist Approach for Discovering Knowledge in Generative Models 22 hours ago | arxiv.org

abstract arxiv cs.ai cs.cv +15

A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts 22 hours ago | arxiv.org

abstract applications arxiv behavior +22

Optimal Best Arm Identification with Fixed Confidence in Restless Bandits 22 hours ago | arxiv.org

abstract arm arxiv confidence +18

Aligning Text-to-Image Diffusion Models with Reward Backpropagation 22 hours ago | arxiv.org

arxiv backpropagation cs.ai cs.cv +10

An Explainable Deep-learning Model of Proton Auroras on Mars 22 hours ago | arxiv.org

abstract alpha arxiv astro-ph.ep +11

Manipulating Embeddings of Stable Diffusion Prompts 22 hours ago | arxiv.org

abstract analyze arxiv continuous +20

Local primordial non-Gaussianity from the large-scale clustering of photometric DESI luminous red galaxies 22 hours ago | arxiv.org

abstract angular arxiv astro-ph.co +14

AI Focused Biochemistry Postdoctoral Fellow

@ Lawrence Berkeley National Lab | Berkeley, CA

View on ai-jobs.net

Senior Data Engineer

@ Displate | Warsaw

View on ai-jobs.net

Associate Director, IT Business Partner, Cell Therapy Analytical Development

@ Bristol Myers Squibb | Warren - NJ

View on ai-jobs.net

Solutions Architect

@ Lloyds Banking Group | London 125 London Wall

View on ai-jobs.net

Senior Lead Cloud Engineer

@ S&P Global | IN - HYDERABAD ORION

View on ai-jobs.net

Software Engineer

@ Applied Materials | Bengaluru,IND

View on ai-jobs.net