Jan. 17, 2022, 2:11 a.m. | Zhizhou Ren, Guangxiang Zhu, Hao Hu, Beining Han, Jianglun Chen, Chongjie Zhang

cs.LG updates on arXiv.org arxiv.org

Double Q-learning is a classical method for reducing overestimation bias,
which is caused by taking maximum estimated values in the Bellman operation.
Its variants in the deep Q-learning paradigm have shown great promise in
producing reliable value prediction and improving learning performance.
However, as shown by prior work, double Q-learning is not fully unbiased and
suffers from underestimation bias. In this paper, we show that such
underestimation bias may lead to multiple non-optimal fixed points under an
approximate Bellman operator. …

arxiv bias learning

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Applied Scientist, Control Stack, AWS Center for Quantum Computing

@ Amazon.com | Pasadena, California, USA

Specialist Marketing with focus on ADAS/AD f/m/d

@ AVL | Graz, AT

Machine Learning Engineer, PhD Intern

@ Instacart | United States - Remote

Supervisor, Breast Imaging, Prostate Center, Ultrasound

@ University Health Network | Toronto, ON, Canada

Senior Manager of Data Science (Recommendation Science)

@ NBCUniversal | New York, NEW YORK, United States