all AI news
Settling Constant Regrets in Linear Markov Decision Processes
April 17, 2024, 4:42 a.m. | Weitong Zhang, Zhiyuan Fan, Jiafan He, Quanquan Gu
cs.LG updates on arXiv.org arxiv.org
Abstract: We study the constant regret guarantees in reinforcement learning (RL). Our objective is to design an algorithm that incurs only finite regret over infinite episodes with high probability. We introduce an algorithm, Cert-LSVI-UCB, for misspecified linear Markov decision processes (MDPs) where both the transition kernel and the reward function can be approximated by some linear function up to misspecification level $\zeta$. At the core of Cert-LSVI-UCB is an innovative certified estimator, which facilitates a fine-grained …
abstract algorithm arxiv cs.lg decision design episodes kernel linear markov probability processes reinforcement reinforcement learning study transition type
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Senior Machine Learning Engineer
@ GPTZero | Toronto, Canada
ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)
@ HelloBetter | Remote
Doctoral Researcher (m/f/div) in Automated Processing of Bioimages
@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena
Seeking Developers and Engineers for AI T-Shirt Generator Project
@ Chevon Hicks | Remote
Data Architect
@ S&P Global | IN - HYDERABAD SKYVIEW
Data Architect I
@ S&P Global | US - VA - CHARLOTTESVILLE 212 7TH STREET