Self-Extend LLM: Upgrade your context length | allainews.com

Jan. 16, 2024, 1 p.m. | code_your_own_AI

code_your_own_AI www.youtube.com

Self-Extend LLM: When LLMs encounter text sequences during inference - exceeding the length of their pre-training context window, we are faced with out-of-distribution (O.O.D) issues related to positional encoding.

Neural networks (NNs) and in particular LLMs are susceptible to unpredictable behaviors when dealing with O.O.D inputs. We analyse a new solution, to increase the context length of LLM during inference!

Introducing grouped self-attention, that extends the classical self-attention of transformers outside of their pre-trained context length!

All rights w/ authors: …

context context window distribution encoding inference inputs llm llms networks neural networks nns positional encoding pre-training solution text training upgrade

More from www.youtube.com / code_your_own_AI

Multi-Token Prediction (forget next token LLM?) 15 hours ago | www.youtube.com

architecture autoregressive benchmark data +13

NEW LLM Test: Reasoning & gpt2-chatbot 1 day, 20 hours ago | www.youtube.com

blind causal chatbot gpt2-chatbot +8

LLMs: Rewriting Our Tomorrow (plus code) #ai 3 days, 3 hours ago | www.youtube.com

ai systems code effects future +10

Autonomous AI Agents: 14 % MAX Performance 4 days, 15 hours ago | www.youtube.com

agents ai agents autonomous autonomous agents +14

480B LLM as 128x4B MoE? WHY? 6 days, 15 hours ago | www.youtube.com

architecture architectures causal comparison +15

No more Fine-Tuning: Unsupervised ICL+ 1 week, 1 day ago | www.youtube.com

advanced autonomous context deepmind +17

NEW Phi-3 mini 3.8B LLM for Your PHONE: 1st TEST 1 week, 1 day ago | www.youtube.com

datasets llama llama 3 llm +9

BEST LLMs for Coding, Long Context, Overall Perform 1 week, 2 days ago | www.youtube.com

april benchmark benchmarks coding +12

Next-Gen AI: RecurrentGemma (Long Context Length) 1 week, 4 days ago | www.youtube.com

architecture attention brand complexity +17

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Data Analyst (Digital Business Analyst)

@ Activate Interactive Pte Ltd | Singapore, Central Singapore, Singapore

View on ai-jobs.net