[D] Removed 50% of the weights from a top leaderboard LLM without negatively impacting the evals | allainews.com

Dec. 21, 2023, 1:37 p.m. | /u/mwitiderrick

Machine Learning www.reddit.com

I removed 50% of the weights from a top leaderboard LLM without negatively impacting the evals.

Using SparseML I was able to zero out 50% of the

SOLAR-10.7B-Instruct-v1.0 weights.

I then quantized the remaining weights to INT8.

The results are amazing!

https://preview.redd.it/uefy5u1hin7c1.png?width=927&format=png&auto=webp&s=35f9c3a07ab3e7f3a0e22a7528adeafc71c4e8e5

Even after pruning and quantizing the model to 50% I still got stellar zero-shot evaluation results.

Try the model:

https://preview.redd.it/r5tmixshin7c1.png?width=1999&format=png&auto=webp&s=61370090bb0083fecde7b00310bda71527e2eb61

Interestingly, the model is pruned and quantized in one shot. This means that no retraining …

evals leaderboard llm machinelearning solar

More from www.reddit.com / Machine Learning

[Research] Understanding The Attention Mechanism In Transformers: A 5-minute visual guide. 🧠 7 hours ago | www.reddit.com

architectures attention dictionary guide +12

[D] Is there a more systematic way of choosing the layers or how deep the … 12 hours ago | www.reddit.com

architecture deep learning least machinelearning +6

[D] Where does the real value of a data scientist come from? 16 hours ago | www.reddit.com

code companies data data scientist +11

[D] NVIDIA GPU Benchmarks & Comparison 18 hours ago | www.reddit.com

a100 ada cards cloud +15

[N] 1st Workshop on In-Context Learning at ICML 2024 19 hours ago | www.reddit.com

context context learning icml in-context learning +2

[R] A Careful Examination of Large Language Model Performance on Grade School Arithmetic 20 hours ago | www.reddit.com

abstract benchmark benchmarks claim +21

[D] [R] Are there any methods/works that enable extracting high-quality dense feature map from CLIP/OpenCLIP … 22 hours ago | www.reddit.com

clip compute feature finetuning +8

[P] [D] Is inference time the important performance metric for ML Models on edge/mobile? 1 day, 3 hours ago | www.reddit.com

apps devices edge embed +15

[D] UI-based Agents - the next big thing? 1 day, 4 hours ago | www.reddit.com

agents ai agents become big +10

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net