all AI news
Exploring the Latency/Throughput & Cost Space for LLM Inference // Timothée Lacroix // LLM 3 Talk 3
Oct. 25, 2023, 10:13 a.m. | MLOps.community
MLOps.community www.youtube.com
Getting the right LLM inference stack means choosing the right model for your task, and running it on the right hardware, with proper inference code. This talk will go through popular inference stacks and set-ups, detailing what makes inference costly. We'll talk about the current generation of open-source models and how to make the best use of them, but we will also touch on features currently missing from the open-source serving stack as well as what the future …
abstract code cost hardware inference latency llm popular running set space stack stacks talk through ups
More from www.youtube.com / MLOps.community
Leading Enterprise Data Teams // Sol Rashidi // MLOps Podcast #227
5 days, 16 hours ago |
www.youtube.com
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Business Data Scientist, gTech Ads
@ Google | Mexico City, CDMX, Mexico
Lead, Data Analytics Operations
@ Zocdoc | Pune, Maharashtra, India