March 18, 2024, 11:25 a.m. | /u/Massive_Horror9038

Deep Learning www.reddit.com

I'm going to start a study group focused on large language models. The participants are PhD students in Computer Science with a math background. I would like to first study some theoretical properties of transformers (or attention). Maybe some of the students still do not know exactly how a transformer is formulated, so I'll also need to discuss that.



Do you have any suggestions of papers with an theoretical analysis of transformers (attention)?

The most popular paper for attention …

attention computer computer science deeplearning language language models large language large language models math paper phd science students study transformer transformers

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne