April 9, 2024, 6:25 a.m. | /u/stereotypical_CS

Machine Learning www.reddit.com

Pardon my bad diagrams. I'm trying to understand how data parallelism works with an [asynchronous parameter server](https://docs.ray.io/en/latest/ray-core/examples/plot_parameter_server.html#asynchronous-parameter-server-training).

My current understanding is that there is an async parameter server and (for example) we have 2 GPU workers. The GPU workers' jobs are to calculate the gradient of one batch of the data, then send that gradient update to the parameter server. The parameter server will then compute the new weights, and then send it to the respective GPU without waiting on …

async compute current data example gpu gradient jobs machinelearning server understanding update will workers

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York