all AI news
Constrained Policy Optimization for Controlled Self-Learning in Conversational AI Systems. (arXiv:2209.08429v1 [cs.LG])
cs.CL updates on arXiv.org arxiv.org
Recently, self-learning methods based on user satisfaction metrics and
contextual bandits have shown promising results to enable consistent
improvements in conversational AI systems. However, directly targeting such
metrics by off-policy bandit learning objectives often increases the risk of
making abrupt policy changes that break the current user experience. In this
study, we introduce a scalable framework for supporting fine-grained
exploration targets for individual domains via user-defined constraints. For
example, we may want to ensure fewer policy deviations in business-critical
domains …
ai systems arxiv conversational conversational ai optimization policy self-learning systems