Aug. 17, 2023, 1:09 p.m. | MLOps.community

MLOps.community www.youtube.com

// Abstract
GPU-enabled hosts are a significant driver of cloud costs for teams serving LLMs in production. Preemptible instances can provide significant savings but generally aren’t fit for highly available services. This lightning talk tells the story of how Replit switched to preemptible GKE nodes, tamed the ensuing chaos, and saved buckets of cash while improving uptime.

// Bio
Replit engineer focused on reliable and scalable LLM infrastructure. Formerly, YouTube's first SRE, longtime Googler and early PayPal linux guy.

abstract chaos cloud conference costs driver gke gpu instances llms part prod production replit server services startup story talk

More from www.youtube.com / MLOps.community

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York