Oct. 3, 2023, 12:58 p.m. | /u/Successful-Western27

Artificial Intelligence www.reddit.com

LLMs like GPT-3 struggle in streaming uses like chatbots because their performance tanks on long texts exceeding their training length. I checked out a new paper investigating why windowed attention fails for this.

By visualizing the attention maps, the researchers noticed LLMs heavily attend initial tokens as "attention sinks" even if meaningless. This anchors the distribution.

They realized evicting these sink tokens causes the attention scores to get warped, destabilizing predictions.

Their proposed "StreamingLLM" method simply caches a few initial …

artificial attention chatbots context context windows fine-tuning gpt gpt-3 llms maps paper performance researchers streaming training windows

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US