all AI news
[R] Better & Faster Large Language Models via Multi-token Prediction
May 10, 2024, 9:59 a.m. | /u/EternalBlueFriday
Machine Learning www.reddit.com
**Abstract**:
>Large language models such as GPT and Llama are trained with a next-token prediction loss. In this work, we suggest that training language models to predict *multiple* future tokens at once results in higher sample efficiency. More specifically, at each position in the training corpus, we ask the model to predict the following n tokens using n independent output heads, operating on top of a shared model trunk. Considering multi-token prediction as an auxiliary training task, we …
abstract efficiency future gpt independent language language models large language large language models llama loss machinelearning multiple next prediction results sample token tokens training work
More from www.reddit.com / Machine Learning
[D] Does DSPy actually change the LM weights?
1 day, 2 hours ago |
www.reddit.com
[D] Culture of Recycling Old Conference Submissions in ML
1 day, 5 hours ago |
www.reddit.com
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US