[D] Experiments with Mixtral-8x7B using Multiple Libraries - Got max 52 tokens/sec. Thoughts? | allainews.com

Jan. 30, 2024, 12:40 a.m. | /u/Tiny_Cut_8440

Machine Learning www.reddit.com

Hi everyone,

Recently experimented with deploying the Mixtral-8x7B model and wanted to share key findings for those interested:

**Best Performance**: With Quantized 8-bit model using Pytorch(nightly) got an average token generation rate of 52.03 token/sec on A100, average inference of 4.94 seconds and cold-start 11.48 secs ( matters when deployed in serverless environment)

[Mixtral Experiments](https://preview.redd.it/i7mbjzl74hfc1.png?width=1600&format=png&auto=webp&s=1bb27c889d3b76a50b33cd549a7156702b5b4ae3)

**Other Libraries Tested:** vLLM, AutoGPTQ, HQQ

Keen to hear your experiences and learnings in similar deployments!

a100 inference key libraries machinelearning max mixtral multiple performance pytorch rate sec thoughts token tokens

More from www.reddit.com / Machine Learning

[D] Is there a more systematic way of choosing the layers or how deep the … 8 hours ago | www.reddit.com

architecture deep learning least machinelearning +6

[D] Where does the real value of a data scientist come from? 12 hours ago | www.reddit.com

code companies data data scientist +11

[D] NVIDIA GPU Benchmarks & Comparison 15 hours ago | www.reddit.com

a100 ada cards cloud +15

[N] 1st Workshop on In-Context Learning at ICML 2024 15 hours ago | www.reddit.com

context context learning icml in-context learning +2

[R] A Careful Examination of Large Language Model Performance on Grade School Arithmetic 16 hours ago | www.reddit.com

abstract benchmark benchmarks claim +21

[D] [R] Are there any methods/works that enable extracting high-quality dense feature map from CLIP/OpenCLIP … 18 hours ago | www.reddit.com

clip compute feature finetuning +8

[P] [D] Is inference time the important performance metric for ML Models on edge/mobile? 23 hours ago | www.reddit.com

apps devices edge embed +15

[D] UI-based Agents - the next big thing? 1 day, 1 hour ago | www.reddit.com

agents ai agents become big +10

[D] Any-dimensional equivariant neural networks 1 day, 1 hour ago | www.reddit.com

abstract assumptions authors cases +18

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net