[R] Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers | allainews.com

May 29, 2023, 12:41 p.m. | /u/asotos11

Machine Learning www.reddit.com

**Abstract:**

>Autoregressive Transformers adopted in Large Language Models (LLMs) are hard to scale to long sequences. Despite several works trying to reduce their computational cost, most of LLMs still adopt attention layers between all pairs of tokens in the sequence, thus incurring a quadratic cost. In this study, we present a novel approach that dynamically prunes contextual information while preserving the model's expressiveness, resulting in reduced memory and computational requirements during inference. Our method employs a learnable mechanism that determines …

abstract attention computational context cost dynamic language language models large language models llms machinelearning pruning reduce scale study tokens transformers

More from www.reddit.com / Machine Learning

[D] Preserving spatial distribution of data during data splitting 12 hours ago | www.reddit.com

data dataset distribution machinelearning +6

[N] Snowflake releases open (Apache 2.0) 128x3B MoE model 12 hours ago | www.reddit.com

apache apache 2.0 machinelearning moe +2

[D] Why would such a simple sentence break an LLM? 13 hours ago | www.reddit.com

copilot disadvantages german gpt4 +7

[R] I made an app to predict ICML paper acceptance from reviews 16 hours ago | www.reddit.com

analysis conferences iclr machinelearning +6

[R] SpaceByte: Towards Deleting Tokenization from Large Language Modeling - Rice University 2024 - Practically … 17 hours ago | www.reddit.com

abstract machinelearning

[D] Keeping track of models and their associated metadata. 19 hours ago | www.reddit.com

industry machinelearning metadata project +1

[D] How researcher think of inductive bias when thinking of creating new/improving foundational models? 1 day, 2 hours ago | www.reddit.com

bias foundational foundational models improving +14

[R] Generalized Contrastive Learning for Multi-Modal Retrieval and Ranking 1 day, 6 hours ago | www.reddit.com

clip documents encode generalized +15

[D] Practical uses of AI inside companies 1 day, 6 hours ago | www.reddit.com

ai inside companies concrete course +17

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Data Engineer

@ Parker | New York City

View on ai-jobs.net

Sr. Data Analyst | Home Solutions

@ Three Ships | Raleigh or Charlotte, NC

View on ai-jobs.net