all AI news
[D] Serverless Inference for Llama2
Aug. 25, 2023, 10:19 a.m. | /u/MiNeves
Machine Learning www.reddit.com
I am part of a small (startup like) organization and want to use a model to answer client requests but these should not be 24/7 so I started looking at serverless inference. I have been warned about cold start times since the desired latency is of about 1-5 sec. I am using a Llama2-7b-GPTQ model (quantized) and also experimenting with the 13b version. The model weights take about 10GB of memory. I still do not have …
client cold start inference latency llama2 machinelearning organization part sec serverless serverless inference small startup
More from www.reddit.com / Machine Learning
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US