March 7, 2024, 3:23 p.m. | Aleksa Gordić - The AI Epiphany

Aleksa Gordić - The AI Epiphany www.youtube.com

Become a Patreon: https://www.patreon.com/theaiepiphany
👨‍👩‍👧‍👦 Join our Discord community: https://discord.gg/peBrCpheKE

Horace He joined us today to talk more about how to make inference fast using just PyTorch native operations!

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
https://pytorch.org/blog/accelerating-generative-ai-2/
https://github.com/pytorch-labs/gpt-fast
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

⌚️ Timetable:
00:00 - 00:45 Intro
00:45 - 02:23 HyperStack GPUs! (sponsored)
02:23 - 08:40 What is GPT-Fast?
08:40 - 28:15 PyTorch compile
28:15 - 32:15 int8 quantization
32:15 - 40:12 Speculative Decoding
40:12 - 42:05 Int 4 quantization
42:05 - 45:25 Putting it all together, tensor …

decoding gpt gpus inference intro operations pytorch quantization sponsored talk

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Data Engineer

@ Quantexa | Sydney, New South Wales, Australia

Staff Analytics Engineer

@ Warner Bros. Discovery | NY New York 230 Park Avenue South