all AI news
Preemption Chaos and Optimizing Server Startup // Bradley Heilbrun // LLMs in Prod Conference Part 2
Aug. 17, 2023, 1:09 p.m. | MLOps.community
MLOps.community www.youtube.com
GPU-enabled hosts are a significant driver of cloud costs for teams serving LLMs in production. Preemptible instances can provide significant savings but generally aren’t fit for highly available services. This lightning talk tells the story of how Replit switched to preemptible GKE nodes, tamed the ensuing chaos, and saved buckets of cash while improving uptime.
// Bio
Replit engineer focused on reliable and scalable LLM infrastructure. Formerly, YouTube's first SRE, longtime Googler and early PayPal linux guy.
abstract chaos cloud conference costs driver gke gpu instances llms part prod production replit server services startup story talk
More from www.youtube.com / MLOps.community
Jobs in AI, ML, Big Data
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US
Research Engineer
@ Allora Labs | Remote
Ecosystem Manager
@ Allora Labs | Remote
Founding AI Engineer, Agents
@ Occam AI | New York