March 13, 2024, 4:42 a.m. | Motoki Omura, Takayuki Osa, Yusuke Mukuta, Tatsuya Harada

cs.LG updates on arXiv.org arxiv.org

arXiv:2403.07704v1 Announce Type: new
Abstract: In deep reinforcement learning, estimating the value function to evaluate the quality of states and actions is essential. The value function is often trained using the least squares method, which implicitly assumes a Gaussian error distribution. However, a recent study suggested that the error distribution for training the value function is often skewed because of the properties of the Bellman operator, and violates the implicit assumption of normal error distribution in the least squares method. …

abstract arxiv cs.ai cs.lg distribution error function however least online reinforcement learning q-learning quality reinforcement reinforcement learning skewness squares study type value

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Analyst (Digital Business Analyst)

@ Activate Interactive Pte Ltd | Singapore, Central Singapore, Singapore