At Yandex, we’ve developed an enhanced version of FSDP, called YaFSDP, which shows an impressive speedup of up to 26% (compared to FSDP) in LLM training time and huge savings in GPU resources. For instance, in a pre-training scenario involving a model with 70 billion parameters, using YaFSDP can save the resources of approximately 150 GPUs, which translates to roughly $0.5 to $1.5 million (depending on the virtual GPU provider or platform) in potential monthly savings. 

