May 22, 2024, 1:08 a.m. | /u/ai-lover

machinelearningnews www.reddit.com

Researchers at Gradient introduced the Llama-3 8B Gradient Instruct 1048k model, a groundbreaking advancement in language models. This model extends the context length from 8,000 to over 1,048,000 tokens, showcasing the ability to manage long contexts with minimal additional training. Utilizing techniques like NTK-aware interpolation and Ring Attention, the researchers significantly improved training efficiency and speed, enabling the model to handle extensive data without the typical performance drop associated with longer contexts.

The researchers employed techniques such as NTK-aware interpolation …

advancement ai researchers context context length gradient gradient ai groundbreaking interpolation language language models llama machinelearningnews researchers standards tokens training

More from www.reddit.com / machinelearningnews

Senior Data Engineer

@ Displate | Warsaw

Associate Director, Technology & Data Lead - Remote

@ Novartis | East Hanover

Product Manager, Generative AI

@ Adobe | San Jose

Associate Director – Data Architect Corporate Functions

@ Novartis | Prague

Principal Data Scientist

@ Salesforce | California - San Francisco

Senior Analyst Data Science

@ Novartis | Hyderabad (Office)