Infinite context windows? Streaming LLMs can be extended to infinite sequence lengths without any fine-tuning. | allainews.com

Oct. 3, 2023, 12:58 p.m. | /u/Successful-Western27

Artificial Intelligence www.reddit.com

LLMs like GPT-3 struggle in streaming uses like chatbots because their performance tanks on long texts exceeding their training length. I checked out a new paper investigating why windowed attention fails for this.

By visualizing the attention maps, the researchers noticed LLMs heavily attend initial tokens as "attention sinks" even if meaningless. This anchors the distribution.

They realized evicting these sink tokens causes the attention scores to get warped, destabilizing predictions.

Their proposed "StreamingLLM" method simply caches a few initial …

artificial attention chatbots context context windows fine-tuning gpt gpt-3 llms maps paper performance researchers streaming training windows

More from www.reddit.com / Artificial Intelligence

Is this why OpenAI didn't release their desktop app on Windows (Microsoft Event) 3 hours ago | www.reddit.com

app artificial copilot desktop +8

Future media consumption with AI 5 hours ago | www.reddit.com

ai companies artificial audiobook book +11

What’s the Best Way to Spend $20/Month to Experiment With AI? 14 hours ago | www.reddit.com

artificial chatgpt claude community +10

What's the likelihood of free & open source AI video models catching up or being … 23 hours ago | www.reddit.com

ai video ai video models artificial facebook +11

Better Help using AI to write articles? Random article based on a Vocaloid song completely … 1 day, 7 hours ago | www.reddit.com

article articles artificial context +2

WSJ post: AI and Law Professor’s Search for Rare Recordings Resurrects Voices of Landmark Segregation … 2 days, 6 hours ago | www.reddit.com

ai and law artificial case landmark +5

Researchers Train AI Doctors In Hospital Simulation 2 days, 12 hours ago | www.reddit.com

agent ai research artificial china +15

Instagram Co-Founder Joins Anthropic 2 days, 20 hours ago | www.reddit.com

anthropic artificial co-founder founder +2

GPT-4o Math Demo With the API 2 days, 21 hours ago | www.reddit.com

api artificial demo gpt +2

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net