March 18, 2024, 4:41 a.m. | Ziteng Sun, Jae Hun Ro, Ahmad Beirami, Ananda Theertha Suresh

cs.LG updates on arXiv.org arxiv.org

arXiv:2403.10444v1 Announce Type: new
Abstract: Speculative decoding has shown to be an effective method for lossless acceleration of large language models (LLMs) during inference. In each iteration, the algorithm first uses a smaller model to draft a block of tokens. The tokens are then verified by the large model in parallel and only a subset of tokens will be kept to guarantee that the final output follows the distribution of the large model. In all of the prior speculative decoding …

abstract algorithm arxiv block cs.cl cs.ds cs.it cs.lg decoding draft inference iteration language language models large language large language models llms math.it the algorithm tokens type verification

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US