all AI news
[D] How are the popular LLM API servings optimized?
Nov. 6, 2023, 4:22 p.m. | /u/shreyansh26
Machine Learning www.reddit.com
They seem pretty fast especially for the 30B+ model sizes. Anyone know how these are optimized? Apart from the horizontal scaling across GPUs and probably dynamic batching (assuming the requests are large in number), what else are these companies doing?
Some of these companies also released the APIs the very next day the models come out - …
api batching companies dynamic gpus large langauge models llm machinelearning perplexity popular replit scaling together together ai
More from www.reddit.com / Machine Learning
Jobs in AI, ML, Big Data
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Robotics Technician - 3rd Shift
@ GXO Logistics | Perris, CA, US, 92571