all AI news
Preemption Chaos and Optimizing Server Startup // Bradley Heilbrun // LLMs in Prod Conference Part 2
Aug. 17, 2023, 1:09 p.m. | MLOps.community
MLOps.community www.youtube.com
GPU-enabled hosts are a significant driver of cloud costs for teams serving LLMs in production. Preemptible instances can provide significant savings but generally aren’t fit for highly available services. This lightning talk tells the story of how Replit switched to preemptible GKE nodes, tamed the ensuing chaos, and saved buckets of cash while improving uptime.
// Bio
Replit engineer focused on reliable and scalable LLM infrastructure. Formerly, YouTube's first SRE, longtime Googler and early PayPal linux guy.
abstract chaos cloud conference costs driver gke gpu instances llms part prod production replit server services startup story talk
More from www.youtube.com / MLOps.community
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne