Nov. 6, 2023, 4:22 p.m. | /u/shreyansh26

Machine Learning www.reddit.com

Currently there are a ton of offerings of various large langauge models hosted by companies like Together AI, Perplexity, Replit and many others.

They seem pretty fast especially for the 30B+ model sizes. Anyone know how these are optimized? Apart from the horizontal scaling across GPUs and probably dynamic batching (assuming the requests are large in number), what else are these companies doing?

Some of these companies also released the APIs the very next day the models come out - …

api batching companies dynamic gpus large langauge models llm machinelearning perplexity popular replit scaling together together ai

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Robotics Technician - 3rd Shift

@ GXO Logistics | Perris, CA, US, 92571