[D] Inference speed of 4 bit vs float16 | allainews.com

April 15, 2024, 11:57 a.m. | /u/themathstudent

Machine Learning www.reddit.com

Assuming I had a GPU that could load a 7B model without compressing, just wanted to know if 4bit quantization was faster for inference? Or do the 4bit vectors need to be decompressed making 4bit quantization slower?

Here is a sample code to load Mistral.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

device = "cuda" # the device to load the model onto
model_id = "mistralai/Mistral-7B-Instruct-v0.2"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map="auto")
```

code faster gpu import inference machinelearning making mistral python quantization sample speed transformers vectors

More from www.reddit.com / Machine Learning

Alice's Adventures in a Differentiable Wonderland -- Volume I, A Tour of the Land 6 hours ago | www.reddit.com

differentiable machinelearning

What cool thing are you using it for?[D] 14 hours ago | www.reddit.com

agriculture car detection driving +8

[R] CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments 15 hours ago | www.reddit.com

agent ai-powered ai-powered tool automated +18

[D] Evaluating LLMs Long-Context performance: What are the best practices? 21 hours ago | www.reddit.com

benchmarks best practices context frameworks +8

[R] Measuring Vision-Language STEM Skills of Neural Models 21 hours ago | www.reddit.com

abstract authors challenge engineering +16

[R] NExT: Teaching Large Language Models to Reason about Code Execution 1 day ago | www.reddit.com

abstract code debug debugging +20

How much coursework is required to land an entry-level ML job? [D] 1 day, 2 hours ago | www.reddit.com

berkeley building epidemiology job +4

[D] Foundational papers for Graph Adversarial Learning? 1 day, 4 hours ago | www.reddit.com

machinelearning papers understanding

[D] Suggestions for NLP Papers Commonly Implemented in ML Interviews 1 day, 15 hours ago | www.reddit.com

companies implementation interview interviews +10

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Software Engineer, Machine Learning (Tel Aviv)

@ Meta | Tel Aviv, Israel

View on ai-jobs.net

Senior Data Scientist- Digital Government

@ Oracle | CASABLANCA, Morocco

View on ai-jobs.net