[D] Results from Deploying Quantized version of SOLAR 10.7B-Instruct | allainews.com

Jan. 4, 2024, 1:11 p.m. | /u/Tiny_Cut_8440

Machine Learning www.reddit.com

Hello everyone,

Been working on optimizing upstart.ai SOLAR-10.7B-Instruct-v1.0 model and wanted to share our insights:

🚀 **Our Approach:** Quantized the model using Auto-GPTQ, then deployed with vLLM.

Results: In a serverless setup, we saw 1.37 sec inference, 111.54 tokens/sec, and an 11.69 sec cold start on Nvidia A100 GPU.

https://preview.redd.it/kel8cn5dafac1.png?width=1600&format=png&auto=webp&s=5bca8b5e4a48f5f7a709f44bc431844746c61a77

Other Methods Tested: Although Auto-GPTQ was an option, our experience suggests that vLLM is the superior choice for deployment.

Looking forward to hearing about your experiences with similar projects!

a100 a100 gpu auto cold start gpu hello inference insights machinelearning nvidia nvidia a100 nvidia a100 gpu sec serverless setup solar tokens

More from www.reddit.com / Machine Learning

[D] Is there a more systematic way of choosing the layers or how deep the … 7 hours ago | www.reddit.com

architecture deep learning least machinelearning +6

[D] Where does the real value of a data scientist come from? 11 hours ago | www.reddit.com

code companies data data scientist +11

[D] NVIDIA GPU Benchmarks & Comparison 13 hours ago | www.reddit.com

a100 ada cards cloud +15

[N] 1st Workshop on In-Context Learning at ICML 2024 14 hours ago | www.reddit.com

context context learning icml in-context learning +2

[R] A Careful Examination of Large Language Model Performance on Grade School Arithmetic 15 hours ago | www.reddit.com

abstract benchmark benchmarks claim +21

[D] [R] Are there any methods/works that enable extracting high-quality dense feature map from CLIP/OpenCLIP … 17 hours ago | www.reddit.com

clip compute feature finetuning +8

[P] [D] Is inference time the important performance metric for ML Models on edge/mobile? 22 hours ago | www.reddit.com

apps devices edge embed +15

[D] UI-based Agents - the next big thing? 23 hours ago | www.reddit.com

agents ai agents become big +10

[D] Any-dimensional equivariant neural networks 1 day ago | www.reddit.com

abstract assumptions authors cases +18

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net