Splitwise improves GPU usage by splitting LLM inference phases

Jan. 4, 2024, 5:02 p.m. | Brenda Potts

Expanded LLM use creates new demands on cloud GPU capacity. Splitwise presents an efficient solution by separating the two essential phases of LLM inference, achieving higher throughput within a limited power budget.

The post Splitwise improves GPU usage by splitting LLM inference phases appeared first on Microsoft Research.

budget capacity cloud cloud gpu gpu inference llm microsoft microsoft research power research research blog solution usage

Visit resource

More from www.microsoft.com / Microsoft Research

What’s Your Story: Jacki O’Neill 4 days, 2 hours ago | www.microsoft.com

africa expand good her +10

Research Focus: Week of May 13, 2024 4 days, 21 hours ago | www.microsoft.com

applications blog code community +20

Microsoft at CHI 2024: Innovations in human-centered design 4 days, 23 hours ago | www.microsoft.com

computer design human human-computer interaction +12

RASCAL: Novel robotics for scalable and highly available automated storage and retrieval 5 days, 23 hours ago | www.microsoft.com

automated availability challenges design +10

Enhanced autoscaling with VASIM: Vertical Autoscaling Simulator Toolkit 6 days, 23 hours ago | www.microsoft.com

adjusting algorithms cloud cost +15

MatterSim: A deep-learning model for materials under real-world conditions 6 days, 23 hours ago | www.microsoft.com

challenge design digital digital transformation +13

LLM profiling guides KV cache optimization 1 week, 4 days ago | www.microsoft.com

cache data guides key +15

LoftQ: Reimagining LLM fine-tuning with smarter initialization 1 week, 5 days ago | www.microsoft.com

ai technology computational efficiency energy +11

Abstracts: May 6, 2024 2 weeks ago | www.microsoft.com

benchmark capabilities create data +13

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

all AI news

Splitwise improves GPU usage by splitting LLM inference phases

More from www.microsoft.com / Microsoft Research

Jobs in AI, ML, Big Data

Software Engineer for AI Training Data (School Specific)

Software Engineer for AI Training Data (Python)

Software Engineer for AI Training Data (Tier 2)

Data Engineer

Artificial Intelligence – Bioinformatic Expert

Lead Developer (AI)