all AI news
Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents. (arXiv:2201.04723v1 [cs.CL])
Jan. 14, 2022, 2:10 a.m. | Eric Michael Smith, Orion Hsu, Rebecca Qian, Stephen Roller, Y-Lan Boureau, Jason Weston
cs.CL updates on arXiv.org arxiv.org
At the heart of improving conversational AI is the open problem of how to
evaluate conversations. Issues with automatic metrics are well known (Liu et
al., 2016, arXiv:1603.08023), with human evaluations still considered the gold
standard. Unfortunately, how to perform human evaluations is also an open
problem: differing data collection methods have varying levels of human
agreement and statistical sensitivity, resulting in differing amounts of human
annotation hours and labor costs. In this work we compare five different
crowdworker-based …
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Senior Business Intelligence Developer / Analyst
@ Transamerica | Work From Home, USA
Data Analyst (All Levels)
@ Noblis | Bethesda, MD, United States