April 9, 2024, 6:25 a.m. | /u/stereotypical_CS

Machine Learning www.reddit.com

Pardon my bad diagrams. I'm trying to understand how data parallelism works with an [asynchronous parameter server](https://docs.ray.io/en/latest/ray-core/examples/plot_parameter_server.html#asynchronous-parameter-server-training).

My current understanding is that there is an async parameter server and (for example) we have 2 GPU workers. The GPU workers' jobs are to calculate the gradient of one batch of the data, then send that gradient update to the parameter server. The parameter server will then compute the new weights, and then send it to the respective GPU without waiting on …

async compute current data example gpu gradient jobs machinelearning server understanding update will workers

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

#13721 - Data Engineer - AI Model Testing

@ Qualitest | Miami, Florida, United States

Elasticsearch Administrator

@ ManTech | 201BF - Customer Site, Chantilly, VA