Jan. 14, 2022, 2:10 a.m. | Eric Michael Smith, Orion Hsu, Rebecca Qian, Stephen Roller, Y-Lan Boureau, Jason Weston

cs.CL updates on arXiv.org arxiv.org

At the heart of improving conversational AI is the open problem of how to
evaluate conversations. Issues with automatic metrics are well known (Liu et
al., 2016, arXiv:1603.08023), with human evaluations still considered the gold
standard. Unfortunately, how to perform human evaluations is also an open
problem: differing data collection methods have varying levels of human
agreement and statistical sensitivity, resulting in differing amounts of human
annotation hours and labor costs. In this work we compare five different
crowdworker-based …

agents arxiv human

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Business Intelligence Developer / Analyst

@ Transamerica | Work From Home, USA

Data Analyst (All Levels)

@ Noblis | Bethesda, MD, United States