[D] How are the popular LLM API servings optimized? | allainews.com

Nov. 6, 2023, 4:22 p.m. | /u/shreyansh26

Machine Learning www.reddit.com

Currently there are a ton of offerings of various large langauge models hosted by companies like Together AI, Perplexity, Replit and many others.

They seem pretty fast especially for the 30B+ model sizes. Anyone know how these are optimized? Apart from the horizontal scaling across GPUs and probably dynamic batching (assuming the requests are large in number), what else are these companies doing?

Some of these companies also released the APIs the very next day the models come out - …

api batching companies dynamic gpus large langauge models llm machinelearning perplexity popular replit scaling together together ai

More from www.reddit.com / Machine Learning

[Discussion] Should I go to ICML and present my paper? 13 hours ago | www.reddit.com

academia data data scientist future +10

[P] Panza: A personal email assistant, trained and running on-device 14 hours ago | www.reddit.com

assistant automated email emails +9

[Discussion] Seeking help to find the better GPU setup. Three H100 vs Five A100? 15 hours ago | www.reddit.com

70b a100 budget five +9

[D] Something I always think about, for top conferences like ICML, NeurIPS, CVPR,..etc. How many … 16 hours ago | www.reddit.com

conferences cvpr etc good +8

[D] Benchmark creators should release their benchmark datasets in stages 17 hours ago | www.reddit.com

benchmark benchmarks concerns data +11

[P] spRAG - Open-source RAG implementation for challenging real-world tasks 18 hours ago | www.reddit.com

core hey implementation machinelearning +7

[D] Paper accepted to ICML but not attending in person? 21 hours ago | www.reddit.com

authors conference icml machinelearning +6

[D] Why do juniors (undergraduates or first- to second-year PhD students) have so many papers … 23 hours ago | www.reddit.com

academic conferences etc hello +12

[D] How can I detect the text orientation using MMOCR or MMDET models? 1 day, 3 hours ago | www.reddit.com

example image images issue +5

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Robotics Technician - 3rd Shift

@ GXO Logistics | Perris, CA, US, 92571

View on ai-jobs.net