all AI news
Conservative Distributional Reinforcement Learning with Safety Constraints. (arXiv:2201.07286v1 [cs.LG])
Jan. 20, 2022, 2:10 a.m. | Hengrui Zhang, Youfang Lin, Sheng Han, Shuo Wang, Kai Lv
cs.LG updates on arXiv.org arxiv.org
Safety exploration can be regarded as a constrained Markov decision problem
where the expected long-term cost is constrained. Previous off-policy
algorithms convert the constrained optimization problem into the corresponding
unconstrained dual problem by introducing the Lagrangian relaxation technique.
However, the cost function of the above algorithms provides inaccurate
estimations and causes the instability of the Lagrange multiplier learning. In
this paper, we present a novel off-policy reinforcement learning algorithm
called Conservative Distributional Maximum a Posteriori Policy Optimization
(CDMPO). At first, …
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
IT Data Engineer
@ Procter & Gamble | BUCHAREST OFFICE
Data Engineer (w/m/d)
@ IONOS | Deutschland - Remote
Staff Data Science Engineer, SMAI
@ Micron Technology | Hyderabad - Phoenix Aquila, India
Academically & Intellectually Gifted Teacher (AIG - Elementary)
@ Wake County Public School System | Cary, NC, United States