May 4, 2024, 4:43 p.m. | /u/gamerx88

Machine Learning www.reddit.com

For people who have worked on self hosting your own models, I'm curious about your tech stack and architecture for model serving. Especially for those who are serving models larger than 30B. What optimizations and stack do you find effective for dealing with an environment where request volume is volatile (e.g can spike 10x in minutes), but responsiveness needs to be high?

architecture environment hosting llms machinelearning people production request scale serve stack tech tech stack

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US