all AI news
Constrained Policy Gradient Method for Safe and Fast Reinforcement Learning: a Neural Tangent Kernel Based Approach. (arXiv:2107.09139v2 [cs.LG] UPDATED)
Jan. 24, 2022, 2:11 a.m. | Balázs Varga, Balázs Kulcsár, Morteza Haghir Chehreghani
cs.LG updates on arXiv.org arxiv.org
This paper presents a constrained policy gradient algorithm. We introduce
constraints for safe learning with the following steps. First, learning is
slowed down (lazy learning) so that the episodic policy change can be computed
with the help of the policy gradient theorem and the neural tangent kernel.
Then, this enables us the evaluation of the policy at arbitrary states too. In
the same spirit, learning can be guided, ensuring safety via augmenting episode
batches with states where the desired action …
arxiv gradient kernel learning policy reinforcement learning
More from arxiv.org / cs.LG updates on arXiv.org
A Single-Loop Algorithm for Decentralized Bilevel Optimization
1 day, 6 hours ago |
arxiv.org
CLEANing Cygnus A deep and fast with R2D2
1 day, 6 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Data Management Associate
@ EcoVadis | Ebène, Mauritius
Senior Data Engineer
@ Telstra | Telstra ICC Bengaluru