March 1, 2024, 1:04 p.m. | /u/ripototo

Machine Learning www.reddit.com

I am using double & dueling deep q learning. shortly after reaching epsilon 0.01, the reward starts to go downhill. I am experimenting with different hyper parameters, but would be interested in any similar experiences/ideas.

My guess is that since it is a multi agent scenario, most of the exploration stage, the agents learn the best actions, given kind of random actions from the rest. once epsilon reaches 0.01, the behaviors of the rest of the agents (and thus the …

agent epsilon exploitation ideas machinelearning parameters state

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne