Jan. 16, 2024, 1 p.m. | code_your_own_AI

code_your_own_AI www.youtube.com

Self-Extend LLM: When LLMs encounter text sequences during inference - exceeding the length of their pre-training context window, we are faced with out-of-distribution (O.O.D) issues related to positional encoding.

Neural networks (NNs) and in particular LLMs are susceptible to unpredictable behaviors when dealing with O.O.D inputs. We analyse a new solution, to increase the context length of LLM during inference!

Introducing grouped self-attention, that extends the classical self-attention of transformers outside of their pre-trained context length!

All rights w/ authors: …

context context window distribution encoding inference inputs llm llms networks neural networks nns positional encoding pre-training solution text training upgrade

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Analyst (Digital Business Analyst)

@ Activate Interactive Pte Ltd | Singapore, Central Singapore, Singapore