all AI news
Regularized Q-learning through Robust Averaging
May 6, 2024, 4:43 a.m. | Peter Schmitt-F\"orster, Tobias Sutter
cs.LG updates on arXiv.org arxiv.org
Abstract: We propose a new Q-learning variant, called 2RA Q-learning, that addresses some weaknesses of existing Q-learning methods in a principled manner. One such weakness is an underlying estimation bias which cannot be controlled and often results in poor performance. We propose a distributionally robust estimator for the maximum expected value term, which allows us to precisely control the level of estimation bias introduced. The distributionally robust estimator admits a closed-form solution such that the proposed …
abstract arxiv bias cs.lg estimator math.oc maximum performance q-learning results robust through type value
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)
@ HelloBetter | Remote
Doctoral Researcher (m/f/div) in Automated Processing of Bioimages
@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena
Seeking Developers and Engineers for AI T-Shirt Generator Project
@ Chevon Hicks | Remote
Security Data Engineer
@ ASML | Veldhoven, Building 08, Netherlands
Data Engineer
@ Parsons Corporation | Pune - Business Bay
Data Engineer
@ Parsons Corporation | Bengaluru, Velankani Tech Park