Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand. (arXiv:2112.04139v2 [cs.CL] UPDATED) | allainews.com

May 20, 2022, 1:11 a.m. | Jungo Kasai, Keisuke Sakaguchi, Ronan Le Bras, Lavinia Dunagan, Jacob Morrison, Alexander R. Fabbri, Yejin Choi, Noah A. Smith

cs.CL updates on arXiv.org arxiv.org

Natural language processing researchers have identified limitations of
evaluation methodology for generation tasks, with new questions raised about
the validity of automatic metrics and of crowdworker judgments. Meanwhile,
efforts to improve generation models tend to depend on simple n-gram overlap
metrics (e.g., BLEU, ROUGE). We argue that new advances on models and metrics
should each more directly benefit and inform the other. We therefore propose a
generalization of leaderboards, bidimensional leaderboards (Billboards), that
simultaneously tracks progress in language generation models …

More from arxiv.org / cs.CL updates on arXiv.org

Gradient Flow of Energy: A General and Efficient Approach for Entity Alignment Decoding 14 hours ago | arxiv.org

abstract alignment arxiv cs.cl +19

Recommender Systems in the Era of Large Language Models (LLMs) 14 hours ago | arxiv.org

abstract applications arxiv become +23

EE-TTS: Emphatic Expressive TTS with Linguistic Information 14 hours ago | arxiv.org

abstract arxiv attention challenge +12

Raidar: geneRative AI Detection viA Rewriting 14 hours ago | arxiv.org

abstract ai detection ai-generated content ai-generated text +18

GeoGalactica: A Scientific Large Language Model in Geoscience 14 hours ago | arxiv.org

abstract applications arxiv cs.cl +25

MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning 14 hours ago | arxiv.org

arxiv cs.ai cs.cl multimodal +3

ContraDoc: Understanding Self-Contradictions in Documents with Large Language Models 14 hours ago | arxiv.org

arxiv cs.cl documents language +5

AudioChatLlama: Towards General-Purpose Speech Abilities for LLMs 14 hours ago | arxiv.org

abstract arxiv audio capabilities +17

Adapting Fake News Detection to the Era of Large Language Models 14 hours ago | arxiv.org

abstract adoption age arxiv +18

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

View on ai-jobs.net

Robotics Technician - Weekend Day Shift

@ GXO Logistics | Hillsboro, OR, US, 97124

View on ai-jobs.net

Gen AI Developer

@ NTT DATA | Irving, TX, US

View on ai-jobs.net

Applied AI/ML - Vice President

@ JPMorgan Chase & Co. | LONDON, United Kingdom

View on ai-jobs.net

Research Fellow (Computer Science/Engineering/AI)

@ Nanyang Technological University | NTU Main Campus, Singapore

View on ai-jobs.net

Senior Machine Learning Engineer

@ Rasa | Remote - Germany

View on ai-jobs.net