all AI news
Span-Based Optimal Sample Complexity for Weakly Communicating and General Average Reward MDPs
March 19, 2024, 4:42 a.m. | Matthew Zurek, Yudong Chen
cs.LG updates on arXiv.org arxiv.org
Abstract: We study the sample complexity of learning an $\epsilon$-optimal policy in an average-reward Markov decision process (MDP) under a generative model. For weakly communicating MDPs, we establish the complexity bound $\tilde{O}(SA\frac{H}{\epsilon^2})$, where $H$ is the span of the bias function of the optimal policy and $SA$ is the cardinality of the state-action space. Our result is the first that is minimax optimal (up to log factors) in all parameters $S,A,H$ and $\epsilon$, improving on existing …
abstract arxiv bias complexity cs.it cs.lg decision epsilon function general generative markov math.it math.oc policy process sample stat.ml study type
More from arxiv.org / cs.LG updates on arXiv.org
Testing the Segment Anything Model on radiology data
1 day, 21 hours ago |
arxiv.org
Calorimeter shower superresolution
1 day, 21 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US