Aug. 25, 2023, 10:19 a.m. | /u/MiNeves

Machine Learning www.reddit.com

Serverless Inference for Llama2

I am part of a small (startup like) organization and want to use a model to answer client requests but these should not be 24/7 so I started looking at serverless inference. I have been warned about cold start times since the desired latency is of about 1-5 sec. I am using a Llama2-7b-GPTQ model (quantized) and also experimenting with the 13b version. The model weights take about 10GB of memory. I still do not have …

client cold start inference latency llama2 machinelearning organization part sec serverless serverless inference small startup

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US