Experiments with Mixtral-8x7B using Multiple Libraries - Got max 52 tokens/sec. Thoughts? | allainews.com

Feb. 1, 2024, 1:10 a.m. | /u/Tiny_Cut_8440

machinelearningnews www.reddit.com

Hi everyone,

Recently experimented with deploying the Mixtral-8x7B model and wanted to share key findings for those interested:

**Best Performance**: With Quantized 8-bit model using Pytorch(nightly) got an average token generation rate of 52.03 token/sec on A100, average inference of 4.94 seconds and cold-start 11.48 secs ( matters when deployed in serverless environment)

https://preview.redd.it/93l5oydhjvfc1.png?width=1600&format=png&auto=webp&s=300e6d690d3de995db86fedf633bec25d149b935

**Other Libraries Tested:** vLLM, AutoGPTQ, HQQ

Here is the link to the tutorial - [https://tutorials.inferless.com/deploy-mixtral-8x7b-for-52-tokens-sec-on-a-single-gpu](https://tutorials.inferless.com/deploy-mixtral-8x7b-for-52-tokens-sec-on-a-single-gpu)

Keen to hear your experiences and learnings in similar deployments!

a100 inference key libraries machinelearningnews max mixtral multiple performance pytorch rate sec thoughts token tokens

More from www.reddit.com / machinelearningnews

Meet Verba 1.0: Run State-of-the-Art RAG Locally with Ollama Integration and Open Source Models 5 hours ago | www.reddit.com

art integration machinelearningnews ollama +3

Researchers from Columbia University and Databricks Conducted a Comparative Study of LoRA and Full Finetuning … 1 day, 3 hours ago | www.reddit.com

adjusting columbia columbia university comparative study +18

01.AI Introduces Yi-1.5-34B Model: An Upgraded Version of Yi with a High-Quality Corpus of 500B … 1 day, 20 hours ago | www.reddit.com

machinelearningnews

Meta AI Introduces Chameleon: A New Family of Early-Fusion Token-based Foundation Models that Set a … 2 days, 3 hours ago | www.reddit.com

architecture document enabling family +21

GeoDiffuser: A Zero shot optimization-based method to perform common 2D and 3D image editing tasks … 2 days, 5 hours ago | www.reddit.com

editing image inpainting machinelearningnews +8

Researchers from Cerebras & Neural Magic Introduce Sparse Llama: The First Production LLM based on … 2 days, 6 hours ago | www.reddit.com

austria cerebras cerebras systems create +18

FREE AI WEBINAR from our Partners: 'How to Build Local LLM Apps with Ollama & … 2 days, 8 hours ago | www.reddit.com

ai webinar apps build free +10

SpeechVerse: A Multimodal AI Framework that Enables LLMs to Follow Natural Language Instructions for Performing … 2 days, 9 hours ago | www.reddit.com

ai framework diverse framework language +9

Tired of MMLU? The current models already hit the ceiling? It's time to upgrade MMLU! … 3 days, 5 hours ago | www.reddit.com

benchmark benchmarking capabilities current +13

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net