Web: https://www.reddit.com/r/reinforcementlearning/comments/s9xv6u/value_range_clipping_technique_in_onpolicy/

Jan. 22, 2022, 8:24 a.m. | /u/Spiritual_Fig3632

Reinforcement Learning reddit.com

Hi, I did ablation study about discount factor on my customized environment. When I increased discount factor, then entire value function estimation was higher than before(0.6 -> 3). The result was expected, but there was also a question of whether the value expectation should be clipped to the same range for stability(stable value gradient descent). What do you think? Very thanks!

submitted by /u/Spiritual_Fig3632
[link] [comments]

algorithm policy reinforcementlearning value

Director, Data Science (Advocacy & Nonprofit)

@ Civis Analytics | Remote

Data Engineer

@ Rappi | [CO] Bogotá

Data Scientist V, Marketplaces Personalization (Remote)

@ ID.me | United States (U.S.)

Product OPs Data Analyst (Flex/Remote)

@ Scaleway | Paris

Big Data Engineer

@ Risk Focus | Riga, Riga, Latvia

Internship Program: Machine Learning Backend

@ Nextail | Remote job