April 5, 2024, 4:43 a.m. | Zhenyu He, Zexuan Zhong, Tianle Cai, Jason D. Lee, Di He

cs.LG updates on arXiv.org arxiv.org

arXiv:2311.08252v2 Announce Type: replace-cross
Abstract: We introduce Retrieval-Based Speculative Decoding (REST), a novel algorithm designed to speed up language model generation. The key insight driving the development of REST is the observation that the process of text generation often includes certain common phases and patterns. Unlike previous methods that rely on a draft language model for speculative decoding, REST harnesses the power of retrieval to generate draft tokens. This method draws from the reservoir of existing knowledge, retrieving and employing …

arxiv cs.ai cs.cl cs.ir cs.lg decoding rest retrieval type

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Engineer - New Graduate

@ Applied Materials | Milan,ITA

Lead Machine Learning Scientist

@ Biogen | Cambridge, MA, United States