Experiments with Mixtral-8x7B using Multiple Libraries - Got max 52 tokens/sec. Thoughts? | allainews.com

Feb. 1, 2024, 1:10 a.m. | /u/Tiny_Cut_8440

machinelearningnews www.reddit.com

Hi everyone,

Recently experimented with deploying the Mixtral-8x7B model and wanted to share key findings for those interested:

**Best Performance**: With Quantized 8-bit model using Pytorch(nightly) got an average token generation rate of 52.03 token/sec on A100, average inference of 4.94 seconds and cold-start 11.48 secs ( matters when deployed in serverless environment)

https://preview.redd.it/93l5oydhjvfc1.png?width=1600&format=png&auto=webp&s=300e6d690d3de995db86fedf633bec25d149b935

**Other Libraries Tested:** vLLM, AutoGPTQ, HQQ

Here is the link to the tutorial - [https://tutorials.inferless.com/deploy-mixtral-8x7b-for-52-tokens-sec-on-a-single-gpu](https://tutorials.inferless.com/deploy-mixtral-8x7b-for-52-tokens-sec-on-a-single-gpu)

Keen to hear your experiences and learnings in similar deployments!

a100 inference key libraries machinelearningnews max mixtral multiple performance pytorch rate sec thoughts token tokens

More from www.reddit.com / machinelearningnews

Prometheus 2: An Open Source Language Model that Closely Mirrors Human and GPT-4 Judgements in … 18 hours ago | www.reddit.com

gpt gpt-4 human language +6

Researchers at NVIDIA AI Introduce ‘VILA’: A Vision Language Model that can Reason Among Multiple … 23 hours ago | www.reddit.com

context images language language model +10

This AI Paper by Scale AI Introduces GSM1k for Measuring Reasoning Accuracy in Large Language … 1 day, 7 hours ago | www.reddit.com

accuracy ai paper language language models +9

Researchers at Stanford Introduce SUQL: A Formal Query Language for Integrating Structured and Unstructured Data 1 day, 13 hours ago | www.reddit.com

data language machinelearningnews query +5

Nexa AI Introduces Octopus v4: A Novel Artificial Intelligence Approach that Employs Functional Tokens to … 1 day, 20 hours ago | www.reddit.com

artificial artificial intelligence functional intelligence +5

A Survey of RAG and RAU: Advancing Natural Language Processing with Retrieval-Augmented Language Models 2 days ago | www.reddit.com

language language models language processing machinelearningnews +8

Google DeepMind Introduces Med-Gemini: A Groundbreaking Family of AI Models Revolutionizing Medical Diagnosis and Clinical … 2 days, 8 hours ago | www.reddit.com

ai models clinical deepmind diagnosis +8

This AI Paper Introduces Llama-3-8B-Instruct-80K-QLoRA: New Horizons in AI Contextual Understanding 2 days, 17 hours ago | www.reddit.com

ai paper llama machinelearningnews paper +2

FREE AI WEBINAR: 'Using AWS Bedrock & LangChain for Private LLM App Dev' 2 days, 20 hours ago | www.reddit.com

ai webinar app aws aws bedrock +8

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net