May 20, 2022, 1:11 a.m. | Jungo Kasai, Keisuke Sakaguchi, Ronan Le Bras, Lavinia Dunagan, Jacob Morrison, Alexander R. Fabbri, Yejin Choi, Noah A. Smith

cs.CL updates on arXiv.org arxiv.org

Natural language processing researchers have identified limitations of
evaluation methodology for generation tasks, with new questions raised about
the validity of automatic metrics and of crowdworker judgments. Meanwhile,
efforts to improve generation models tend to depend on simple n-gram overlap
metrics (e.g., BLEU, ROUGE). We argue that new advances on models and metrics
should each more directly benefit and inform the other. We therefore propose a
generalization of leaderboards, bidimensional leaderboards (Billboards), that
simultaneously tracks progress in language generation models …

arxiv language

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

Robotics Technician - Weekend Day Shift

@ GXO Logistics | Hillsboro, OR, US, 97124

Gen AI Developer

@ NTT DATA | Irving, TX, US

Applied AI/ML - Vice President

@ JPMorgan Chase & Co. | LONDON, United Kingdom

Research Fellow (Computer Science/Engineering/AI)

@ Nanyang Technological University | NTU Main Campus, Singapore

Senior Machine Learning Engineer

@ Rasa | Remote - Germany