[D] Inference speed of 4 bit vs float16 | allainews.com

April 15, 2024, 11:57 a.m. | /u/themathstudent

Machine Learning www.reddit.com

Assuming I had a GPU that could load a 7B model without compressing, just wanted to know if 4bit quantization was faster for inference? Or do the 4bit vectors need to be decompressed making 4bit quantization slower?

Here is a sample code to load Mistral.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

device = "cuda" # the device to load the model onto
model_id = "mistralai/Mistral-7B-Instruct-v0.2"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map="auto")
```

code faster gpu import inference machinelearning making mistral python quantization sample speed transformers vectors

More from www.reddit.com / Machine Learning

[D] What on earth is "discretization" step in Mamba? 4 hours ago | www.reddit.com

article core earth form +11

[D] ECCV-2024 reviews are out 17 hours ago | www.reddit.com

eccv machinelearning reviews

[D] ICLR Outstanding Paper Awards. Congratulations! 20 hours ago | www.reddit.com

abstract feature identify images +12

[D] Where does the term "feature" come from? 21 hours ago | www.reddit.com

call engineering feature features +8

[D] Any encoder only model having bigger max token than 512 (BERT, Roberta, etc)? 1 day, 3 hours ago | www.reddit.com

advance bert bigger class +8

[R] AlphaMath Almost Zero: process Supervision without process 1 day, 4 hours ago | www.reddit.com

abstract code errors however +15

[D] ECCV 2024 Review Discussion 1 day, 4 hours ago | www.reddit.com

center conferences eccv machinelearning +5

[D] Is it a good idea for a 3rd year PhD student to start a … 1 day, 6 hours ago | www.reddit.com

academic extra good hearing +7

[D] Use VQ-VAEs for SSL? 1 day, 7 hours ago | www.reddit.com

create diffusion diffusion models embedding +10

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net