all AI news
High-probability sample complexities for policy evaluation with linear function approximation
May 3, 2024, 4:54 a.m. | Gen Li, Weichen Wu, Yuejie Chi, Cong Ma, Alessandro Rinaldo, Yuting Wei
cs.LG updates on arXiv.org arxiv.org
Abstract: This paper is concerned with the problem of policy evaluation with linear function approximation in discounted infinite horizon Markov decision processes. We investigate the sample complexities required to guarantee a predefined estimation error of the best linear coefficients for two widely-used policy evaluation algorithms: the temporal difference (TD) learning algorithm and the two-timescale linear TD with gradient correction (TDC) algorithm. In both the on-policy setting, where observations are generated from the target policy, and the …
abstract algorithms approximation arxiv complexities cs.it cs.lg decision error evaluation function horizon linear markov math.it math.oc math.st paper policy probability processes sample stat.ml stat.th type
More from arxiv.org / cs.LG updates on arXiv.org
Testing the Segment Anything Model on radiology data
1 day, 11 hours ago |
arxiv.org
Calorimeter shower superresolution
1 day, 11 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US