May 31, 2023, 3 p.m. | Venelin Valkov

Venelin Valkov www.youtube.com

In this video, we'll look at QLoRA, an efficient finetuning approach that significantly reduces the GPU memory usage of large language models. With QLoRA, you can now finetune a 65B parameter model on just a single 48GB GPU, while maintaining full 16-bit finetuning task performance. We'll dive into the technical details of QLoRA, which involves backpropagating gradients through a frozen, 4-bit quantized pretrained language model into Low Rank Adapters (LoRA).

Prompt Engineering Tutorial: https://www.mlexpert.io/prompt-engineering
Prompt Engineering GitHub Repository: https://github.com/curiousily/Get-Things-Done-with-Prompt-Engineering-and-LangChain

Discord: …

16-bit finetuning gpu language language models large language models look lora memory paper review usage video

More from www.youtube.com / Venelin Valkov

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Staff Software Engineer, Generative AI, Google Cloud AI

@ Google | Mountain View, CA, USA; Sunnyvale, CA, USA

Expert Data Sciences

@ Gainwell Technologies | Any city, CO, US, 99999