Microsoft releases DeepSpeed-FastGen for High-Throughput Text Generation | allainews.com

Nov. 7, 2023, 5:56 p.m. | Andrew Hoblitzell

InfoQ - AI, ML & Data Engineering www.infoq.com

Microsoft has announced the alpha release of DeepSpeed-FastGen, a system designed to improve the deployment and serving of large language models (LLMs). DeepSpeed-FastGen is the synergistic composition of DeepSpeed-MII and DeepSpeed-Inference . DeepSpeed-FastGen is based on the Dynamic SplitFuse technique. The system currently supports several model architectures.

By Andrew Hoblitzell

ai alpha andrew architectures deep learning deepspeed deployment development dynamic gpu inference language language models large language large language models llms machine learning microsoft ml & data engineering release releases text text generation

More from www.infoq.com / InfoQ - AI, ML & Data Engineering

Meta Releases Llama 3 Open-Source LLM 6 hours ago | www.infoq.com

70b ai anthony benchmarks +22

Java News Roundup: OpenJDK JEPs, Spring Projects, Quarkus, Hibernate, JHipster, JReleaser 1 day, 16 hours ago | www.infoq.com

ai apache camel april architecture & design +27

Ines Montani at QCon London: Economies of Scale Can’t Monopolise the AI Revolution 4 days, 3 hours ago | www.infoq.com

ai ai space architecture & design artificial intelligence +22

Presentation: Building Guardrails for Enterprise AI Applications W/ LLMs 6 days, 9 hours ago | www.infoq.com

ai ai applications applications artificial intelligence +13

Google Text Embedding Model Gecko Distills Large Language Models for Improved Performance 1 week ago | www.infoq.com

ai classification document embedding +17

OpenAI Releases New Fine-Tuning API Features 1 week ago | www.infoq.com

ai anthony api chatgpt +15

InfoQ Dev Summit Boston & Munich: Actionable insights on Generative AI, security, modern web apps 1 week ago | www.infoq.com

ai apps architecture & design best practices +25

Java News Roundup: WildFly 32, JEPs Proposed to Target for JDK 23, Hibernate 6.5, JobRunr … 1 week, 1 day ago | www.infoq.com

ai apache camel april architecture & design +27

Devnexus 2024 Celebrates 20 Years of Java Developer Conferences 1 week, 2 days ago | www.infoq.com

agile ai april architecture +28

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net