Aug. 23, 2023, 5:16 p.m. | /u/thecharlieblake

Machine Learning www.reddit.com

The latest gen of AI chips can do FP8 compute, but making the most of this isn't straightforward - just naïvely inserting FP8 casts causes training to fail (e.g. grads underflow).

To fix this I've been working on a method called *unit scaling*, which I demo in this notebook: [github.com/graphcore-research/out-of-the-box-fp8-training.ipynb](https://github.com/graphcore-research/out-of-the-box-fp8-training/blob/main/out_of_the_box_fp8_training.ipynb)

With a one-line code change (`model = unit_scale(model)`) FP8 training now matches the loss of FP32.

It works by re-scaling operations in the fwd & bwd pass so that training …

ai chips box change chips code compute demo gen line loss machinelearning making nanogpt operations scaling training

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US