Oct. 15, 2023, 3:41 p.m. | Ryan Pégoud

Towards Data Science - Medium towardsdatascience.com

In this article, we learn to vectorize an RL environment and train 30 Q-learning agents in parallel on a CPU, at 1.8 million iterations per second.

Image by Google DeepMind on Unsplash

In the previous story, we introduced Temporal-Difference Learning, particularly Q-learning, in the context of a GridWorld.

Temporal-Difference Learning and the importance of exploration: An illustrated guide

While this implementation served the purpose of demonstrating the differences in performances and exploration mechanisms of these algorithms, it was painfully …

jax machine learning parallel-computing python reinforcement learning

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote