April 16, 2024, 4:51 a.m. | Chang Tian, Wenpeng Yin, Marie-Francine Moens

cs.CL updates on arXiv.org arxiv.org

arXiv:2207.11762v2 Announce Type: replace
Abstract: A dialogue policy module is an essential part of task-completion dialogue systems. Recently, increasing interest has focused on reinforcement learning (RL)-based dialogue policy. Its favorable performance and wise action decisions rely on an accurate estimation of action values. The overestimation problem is a widely known issue of RL since its estimate of the maximum action value is larger than the ground truth, which results in an unstable learning process and suboptimal policy. This problem is …

abstract arxiv cs.ai cs.cl decisions dialogue issue part performance policy reinforcement reinforcement learning systems type values wise

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Data Scientist

@ ITE Management | New York City, United States