all AI news
Results from Deploying Quantized version of SOLAR 10.7B-Instruct
Jan. 5, 2024, 11:39 a.m. | /u/Tiny_Cut_8440
machinelearningnews www.reddit.com
Been working on optimizing upstart.ai SOLAR-10.7B-Instruct-v1.0 model and wanted to share our insights:
🚀 **Our Approach:** Quantized the model using Auto-GPTQ, then deployed with vLLM.
Results: In a serverless setup, we saw 1.37 sec inference, 111.54 tokens/sec, and an 11.69 sec cold start on Nvidia A100 GPU.
https://preview.redd.it/eyym3rc3zlac1.png?width=1600&format=png&auto=webp&s=5846a8b2eb4cf6d9cd8f12545d498c37d3653056
Other Methods Tested: Although Auto-GPTQ was an option, our experience suggests that vLLM is the superior choice for deployment.
Looking forward to hearing about your experiences with similar projects!
a100 a100 gpu auto cold start gpu hello inference insights machinelearningnews nvidia nvidia a100 nvidia a100 gpu sec serverless setup solar tokens
More from www.reddit.com / machinelearningnews
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
C003549 Data Analyst (NS) - MON 13 May
@ EMW, Inc. | Braine-l'Alleud, Wallonia, Belgium
Marketing Decision Scientist
@ Meta | Menlo Park, CA | New York City