all AI news
Upper and lower memory capacity bounds of transformers for next-token prediction
May 24, 2024, 4:42 a.m. | Liam Madden, Curtis Fox, Christos Thrampoulidis
cs.LG updates on arXiv.org arxiv.org
Abstract: Given a sequence of tokens, such as words, the task of next-token prediction is to predict the next-token conditional probability distribution. Decoder-only transformers have become effective models for this task, but their properties are still not fully understood. In particular, the largest number of distinct context sequences that a decoder-only transformer can interpolate next-token distributions for has not been established. To fill this gap, we prove upper and lower bounds on this number, which are …
abstract arxiv become capacity cs.lg decoder distribution math.oc memory next prediction probability token tokens transformers type words
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
AI Focused Biochemistry Postdoctoral Fellow
@ Lawrence Berkeley National Lab | Berkeley, CA
Senior Data Engineer
@ Displate | Warsaw
Associate Director, IT Business Partner, Cell Therapy Analytical Development
@ Bristol Myers Squibb | Warren - NJ
Solutions Architect
@ Lloyds Banking Group | London 125 London Wall
Senior Lead Cloud Engineer
@ S&P Global | IN - HYDERABAD ORION
Software Engineer
@ Applied Materials | Bengaluru,IND