May 24, 2024, 4:42 a.m. | Liam Madden, Curtis Fox, Christos Thrampoulidis

cs.LG updates on arXiv.org arxiv.org

arXiv:2405.13718v1 Announce Type: new
Abstract: Given a sequence of tokens, such as words, the task of next-token prediction is to predict the next-token conditional probability distribution. Decoder-only transformers have become effective models for this task, but their properties are still not fully understood. In particular, the largest number of distinct context sequences that a decoder-only transformer can interpolate next-token distributions for has not been established. To fill this gap, we prove upper and lower bounds on this number, which are …

abstract arxiv become capacity cs.lg decoder distribution math.oc memory next prediction probability token tokens transformers type words

AI Focused Biochemistry Postdoctoral Fellow

@ Lawrence Berkeley National Lab | Berkeley, CA

Senior Data Engineer

@ Displate | Warsaw

Associate Director, IT Business Partner, Cell Therapy Analytical Development

@ Bristol Myers Squibb | Warren - NJ

Solutions Architect

@ Lloyds Banking Group | London 125 London Wall

Senior Lead Cloud Engineer

@ S&P Global | IN - HYDERABAD ORION

Software Engineer

@ Applied Materials | Bengaluru,IND