March 29, 2024, 4:48 a.m. | Luke Gessler

cs.CL updates on arXiv.org arxiv.org

arXiv:2305.12612v2 Announce Type: replace
Abstract: Evaluation datasets are critical resources for measuring the quality of pretrained language models. However, due to the high cost of dataset annotation, these resources are scarce for most languages other than English, making it difficult to assess the quality of language models. In this work, we present a new method for evaluation dataset construction which enables any language with a New Testament translation to receive a suite of evaluation datasets suitable for pretrained language model …

abstract annotation arxiv cost cs.cl dataset datasets english evaluation however language language model language models languages making measuring quality resources type work

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Reporting & Data Analytics Lead (Sizewell C)

@ EDF | London, GB

Data Analyst

@ Notable | San Mateo, CA