Feb. 15, 2024, 5:43 a.m. | Siddharth Chandak, Pratik Shah, Vivek S Borkar, Parth Dodhia

cs.LG updates on arXiv.org arxiv.org

arXiv:2211.01595v4 Announce Type: replace-cross
Abstract: Motivated by the novel paradigm developed by Van Roy and coauthors for reinforcement learning in arbitrary non-Markovian environments, we propose a related formulation and explicitly pin down the error caused by non-Markovianity of observations when the Q-learning algorithm is applied on this formulation. Based on this observation, we propose that the criterion for agent design should be to seek good approximations for certain conditional laws. Inspired by classical stochastic control, we show that our problem …

abstract algorithm arxiv cs.lg cs.sy eess.sy environments error novel observation paradigm pin q-learning reinforcement reinforcement learning type van

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Software Engineer, Generative AI (C++)

@ SoundHound Inc. | Toronto, Canada