Aug. 17, 2023, 1:09 p.m. | MLOps.community

MLOps.community www.youtube.com

// Abstract
GPU-enabled hosts are a significant driver of cloud costs for teams serving LLMs in production. Preemptible instances can provide significant savings but generally aren’t fit for highly available services. This lightning talk tells the story of how Replit switched to preemptible GKE nodes, tamed the ensuing chaos, and saved buckets of cash while improving uptime.

// Bio
Replit engineer focused on reliable and scalable LLM infrastructure. Formerly, YouTube's first SRE, longtime Googler and early PayPal linux guy.

abstract chaos cloud conference costs driver gke gpu instances llms part prod production replit server services startup story talk

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne