Feb. 6, 2024, 5:41 a.m. | Reyna Abhyankar Zijian He Vikranth Srivatsa Hao Zhang Yiying Zhang

cs.LG updates on arXiv.org arxiv.org

Large language models are increasingly integrated with external tools and APIs like ChatGPT plugins to extend their capability beyond language-centric tasks. However, today's LLM inference systems are designed for standalone LLMs. They treat API calls as new requests, causing unnecessary recomputation of already computed contexts, which accounts for 37-40% of total model forwarding time. This paper presents APIServe, the first LLM inference framework targeting API-augmented LLMs. APISERVE minimizes the GPU resource waste caused by API calls and dedicates saved memory …

api apis beyond capability chatgpt chatgpt plugins cs.dc cs.lg inference inferencing language language model language models large language large language models llm llms plugins support systems tasks tools total

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Software Engineer, Machine Learning (Tel Aviv)

@ Meta | Tel Aviv, Israel

Senior Data Scientist- Digital Government

@ Oracle | CASABLANCA, Morocco