Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration | allainews.com

April 19, 2024, 4:47 a.m. | Pengfei Wu, Jiahao Liu, Zhuocheng Gong, Qifan Wang, Jinpeng Li, Jingang Wang, Xunliang Cai, Dongyan Zhao

cs.CL updates on arXiv.org arxiv.org

arXiv:2404.12022v1 Announce Type: new
Abstract: Large language models (LLMs) have recently shown remarkable performance across a wide range of tasks. However, the substantial number of parameters in LLMs contributes to significant latency during model inference. This is particularly evident when utilizing autoregressive decoding methods, which generate one token in a single forward process, thereby not fully capitalizing on the parallel computing capabilities of GPUs. In this paper, we propose a novel parallel decoding approach, namely \textit{hidden transfer}, which decodes multiple …

abstract arxiv autoregressive cs.cl decoding generate hidden however inference language language model language models large language large language model large language models latency llms parameters performance tasks token transfer type via

More from arxiv.org / cs.CL updates on arXiv.org

Knowledge Graphs and Pre-trained Language Models enhanced Representation Learning for Conversational Recommender Systems 13 hours ago | arxiv.org

abstract arxiv context conversation +20

ProCoT: Stimulating Critical Thinking and Writing of Students through Engagement with Large Language Models (LLMs) 13 hours ago | arxiv.org

abstract active learning arxiv chatgpt +22

UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations 13 hours ago | arxiv.org

abstract arxiv commonsense cs.cl +10

Response: Emergent analogical reasoning in large language models 13 hours ago | arxiv.org

abstract acquired analogy arxiv +16

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization 13 hours ago | arxiv.org

abstract agents arxiv autonomous +18

NumLLM: Numeric-Sensitive Large Language Model for Chinese Finance 13 hours ago | arxiv.org

abstract arxiv chinese cs.ce +25

CookingSense: A Culinary Knowledgebase with Multidisciplinary Assertions 13 hours ago | arxiv.org

abstract acquired arxiv collection +17

GOLD: Geometry Problem Solver with Natural Language Description 13 hours ago | arxiv.org

abstract artificial artificial intelligence arxiv +22

Enhancing Surgical Robots with Embodied Intelligence for Autonomous Ultrasound Scanning 13 hours ago | arxiv.org

abstract arxiv autonomous cs.ai +17

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

#13721 - Data Engineer - AI Model Testing

@ Qualitest | Miami, Florida, United States

View on ai-jobs.net

Elasticsearch Administrator

@ ManTech | 201BF - Customer Site, Chantilly, VA

View on ai-jobs.net