Dec. 1, 2023, 4:31 p.m. | /u/danielhanchen

Machine Learning www.reddit.com

Hey [r/MachineLearning](https://www.reddit.com/r/MachineLearning/)!

I manually derived backpropagation steps, did some chained matrix multiplication optims, wrote all kernels in OpenAI's Triton language and did more maths and coding trickery to make QLoRA finetuning for Llama 5x faster on Unsloth: [https://github.com/unslothai/unsloth](https://github.com/unslothai/unsloth)! Some highlights:

* **5x faster** (5 hours to 1 hour)
* Use **50% less memory**
* With **0% loss in accuracy**
* All **locally** on NVIDIA GPUs (Tesla T4, RTX 20/30/40, Ampere, Hopper) for **free**!
* QLoRA / LoRA is now 80% …

accuracy ampere examples faster free gpus hopper hour lora loss machinelearning memory nvidia nvidia gpus orca qlora rtx tesla train trains

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Data Engineer - Takealot Group (Takealot.com | Superbalist.com | Mr D Food)

@ takealot.com | Cape Town