Results from Deploying Quantized version of SOLAR 10.7B-Instruct | allainews.com

Jan. 5, 2024, 11:39 a.m. | /u/Tiny_Cut_8440

machinelearningnews www.reddit.com

Hello everyone,

Been working on optimizing upstart.ai SOLAR-10.7B-Instruct-v1.0 model and wanted to share our insights:

🚀 **Our Approach:** Quantized the model using Auto-GPTQ, then deployed with vLLM.

Results: In a serverless setup, we saw 1.37 sec inference, 111.54 tokens/sec, and an 11.69 sec cold start on Nvidia A100 GPU.

https://preview.redd.it/eyym3rc3zlac1.png?width=1600&format=png&auto=webp&s=5846a8b2eb4cf6d9cd8f12545d498c37d3653056

Other Methods Tested: Although Auto-GPTQ was an option, our experience suggests that vLLM is the superior choice for deployment.

Looking forward to hearing about your experiences with similar projects!

a100 a100 gpu auto cold start gpu hello inference insights machinelearningnews nvidia nvidia a100 nvidia a100 gpu sec serverless setup solar tokens

More from www.reddit.com / machinelearningnews

This AI Paper from Princeton and Stanford Introduces CRISPR-GPT For Innovative Gene-Editing Enhancements 6 hours ago | www.reddit.com

ai paper crispr editing gene +4

Researchers at UC Berkeley Unveil a Novel Interpretation of the U-Net Architecture Through the Lens … 21 hours ago | www.reddit.com

architecture berkeley generative hierarchical +7

FREE AI LIVE WORKSHOP from Gretal AI: 'Speed-up LLM Development with Synthetic Data via Gretel … 1 day, 6 hours ago | www.reddit.com

data development free gretel +8

ScrapeGraphAI: A Web Scraping Python Library that Uses LLMs to Create Scraping Pipelines for Websites, … 1 day, 8 hours ago | www.reddit.com

create documents files library +9

[R] They taught AI to edit genes with CRISPR. It knocked out 4 skin cancer … 1 day, 16 hours ago | www.reddit.com

ai-powered ai-powered tool cancer crispr +19

InternVL 1.5 Advances Multimodal AI with High-Resolution and Bilingual Capabilities in Open-Source Models 1 day, 17 hours ago | www.reddit.com

advances bilingual capabilities machinelearningnews +4

Hippocrates: An Open-Source Machine Learning Framework for Advancing Large Language Models in Healthcare 1 day, 17 hours ago | www.reddit.com

framework healthcare language language models +5

Improving Local RAG with Adaptive Retrieval using Mistral, Ollama and Pathway 1 day, 21 hours ago | www.reddit.com

build embedding embedding models machinelearningnews +8

Llama-3-based OpenBioLLM-Llama3-70B and 8B: Outperforming GPT-4, Gemini, Meditron-70B, Med-PaLM-1 and Med-PaLM-2 in Medical-Domain 2 days, 4 hours ago | www.reddit.com

70b domain gemini gpt +7

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

C003549 Data Analyst (NS) - MON 13 May

@ EMW, Inc. | Braine-l'Alleud, Wallonia, Belgium

View on ai-jobs.net

Marketing Decision Scientist

@ Meta | Menlo Park, CA | New York City

View on ai-jobs.net