Jan. 11, 2022, 10:07 a.m. | /u/studentani

Artificial Intelligence www.reddit.com

I created my custom, grid(7 by 7) environment to apply RL algorithms. I chose Q-learning and Sarsa, in particular.

The grid environment consists of 3 types of terminating states: states with negative reward(-100), state with maximum reward(100) and 2 states with half reward(50).

The main goal of training is for the agent to avoid states with negative rewards and to prefer long-term reward(100) over short-term half reward(50).

The trained agent works weirdly when the half-rewarded state is closer to the …

artificial environment learning

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior AI & Data Engineer

@ Bertelsmann | Kuala Lumpur, 14, MY, 50400

Analytics Engineer

@ Reverse Tech | Philippines - Remote