Jan. 24, 2022, 2:11 a.m. | Balázs Varga, Balázs Kulcsár, Morteza Haghir Chehreghani

cs.LG updates on arXiv.org arxiv.org

This paper presents a constrained policy gradient algorithm. We introduce
constraints for safe learning with the following steps. First, learning is
slowed down (lazy learning) so that the episodic policy change can be computed
with the help of the policy gradient theorem and the neural tangent kernel.
Then, this enables us the evaluation of the policy at arbitrary states too. In
the same spirit, learning can be guided, ensuring safety via augmenting episode
batches with states where the desired action …

arxiv gradient kernel learning policy reinforcement learning

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Management Associate

@ EcoVadis | Ebène, Mauritius

Senior Data Engineer

@ Telstra | Telstra ICC Bengaluru