all AI news
[D] Inference speed of 4 bit vs float16
April 15, 2024, 11:57 a.m. | /u/themathstudent
Machine Learning www.reddit.com
Here is a sample code to load Mistral.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
device = "cuda" # the device to load the model onto
model_id = "mistralai/Mistral-7B-Instruct-v0.2"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map="auto")
```
code faster gpu import inference machinelearning making mistral python quantization sample speed transformers vectors
More from www.reddit.com / Machine Learning
[R] AlphaMath Almost Zero: process Supervision without process
1 day, 4 hours ago |
www.reddit.com
[D] ECCV 2024 Review Discussion
1 day, 4 hours ago |
www.reddit.com
[D] Is it a good idea for a 3rd year PhD student to start a …
1 day, 6 hours ago |
www.reddit.com
[D] Use VQ-VAEs for SSL?
1 day, 7 hours ago |
www.reddit.com
Jobs in AI, ML, Big Data
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US
Research Engineer
@ Allora Labs | Remote
Ecosystem Manager
@ Allora Labs | Remote
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US