all AI news
Derivative-Free Policy Optimization for Linear Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity. (arXiv:2101.01041v3 [math.OC] UPDATED)
Jan. 3, 2022, 2:10 a.m. | Kaiqing Zhang, Xiangyuan Zhang, Bin Hu, Tamer Başar
cs.LG updates on arXiv.org arxiv.org
Direct policy search serves as one of the workhorses in modern reinforcement
learning (RL), and its applications in continuous control tasks have recently
attracted increasing attention. In this work, we investigate the convergence
theory of policy gradient (PG) methods for learning the linear risk-sensitive
and robust controller. In particular, we develop PG methods that can be
implemented in a derivative-free fashion by sampling system trajectories, and
establish both global convergence and sample complexity results in the
solutions of two fundamental …
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Senior ML Researcher - 3D Geometry Processing | 3D Shape Generation | 3D Mesh Data
@ Promaton | Europe
Senior AI Engineer, EdTech (Remote)
@ Lightci | Toronto, Ontario
Data Scientist for Salesforce Applications
@ ManTech | 781G - Customer Site,San Antonio,TX
AI Research Scientist
@ Gridmatic | Cupertino, CA
Data Engineer
@ Global Atlantic Financial Group | Boston, Massachusetts, United States
Machine Learning Engineer - Conversation AI
@ DoorDash | Sunnyvale, CA; San Francisco, CA; Seattle, WA; Los Angeles, CA