Web: http://arxiv.org/abs/2205.05800

Sept. 16, 2022, 1:12 a.m. | Tianjiao Li, Feiyang Wu, Guanghui Lan

cs.LG updates on arXiv.org arxiv.org

We study the problem of average-reward Markov decision processes (AMDPs) and
develop novel first-order methods with strong theoretical guarantees for both
policy evaluation and optimization. Existing on-policy evaluation methods
suffer from sub-optimal convergence rates as well as failure in handling
insufficiently random policies, e.g., deterministic policies, for lack of
exploration. To remedy these issues, we develop a novel variance-reduced
temporal difference (VRTD) method with linear function approximation for
randomized policies along with sharp convergence guarantees, and an exploratory
variance-reduced temporal …

arxiv decision markov processes stochastic

More from arxiv.org / cs.LG updates on arXiv.org

Research Scientists

@ ODU Research Foundation | Norfolk, Virginia

Embedded Systems Engineer (Robotics)

@ Neo Cybernetica | Bedford, New Hampshire

2023 Luis J. Alvarez and Admiral Grace M. Hopper Postdoc Fellowship in Computing Sciences

@ Lawrence Berkeley National Lab | San Francisco, CA

Senior Manager Data Scientist

@ NAV | Remote, US

Senior AI Research Scientist

@ Earth Species Project | Remote anywhere

Research Fellow- Center for Security and Emerging Technology (Multiple Opportunities)

@ University of California Davis | Washington, DC

Staff Fellow - Data Scientist

@ U.S. FDA/Center for Devices and Radiological Health | Silver Spring, Maryland

Staff Fellow - Senior Data Engineer

@ U.S. FDA/Center for Devices and Radiological Health | Silver Spring, Maryland

Senior Research Engineer, Applied Language

@ DeepMind | Mountain View, California, US

Machine Learning Engineer

@ Bluevine | Austin, TX

Lead Manager - Analytics & Data Science

@ Tide | India(Remote)

Machine Learning Engineer

@ Gtmhub | Indore, Madhya Pradesh, India