GEMv2: Multilingual NLG Benchmarking in a Single Line of Code. (arXiv:2206.11249v2 [cs.CL] UPDATED) | allainews.com

June 24, 2022, 1:12 a.m. | Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina McMillan-Major, Anna Shvets, Ashi

cs.CL updates on arXiv.org arxiv.org

Evaluation in machine learning is usually informed by past choices, for
example which datasets or metrics to use. This standardization enables the
comparison on equal footing using leaderboards, but the evaluation choices
become sub-optimal as better alternatives arise. This problem is especially
pertinent in natural language generation which requires ever-improving suites
of datasets, metrics, and human evaluation to make definitive claims. To make
following best model evaluation practices easier, we introduce GEMv2. The new
version of the Generation, Evaluation, and …

arxiv benchmarking code line nlg

More from arxiv.org / cs.CL updates on arXiv.org

Drop your Decoder: Pre-training with Bag-of-Word Prediction for Dense Passage Retrieval 17 hours ago | arxiv.org

abstract arxiv auto bag +17

Does GPT-4 pass the Turing test? 17 hours ago | arxiv.org

abstract arxiv cs.ai cs.cl +16

Carpe Diem: On the Evaluation of World Knowledge in Lifelong Language Models 17 hours ago | arxiv.org

abstract arxiv challenges cs.cl +13

COPAL-ID: Indonesian Language Reasoning with Local Culture and Nuances 17 hours ago | arxiv.org

abstract arxiv causal common sense +11

Empirical study of pretrained multilingual language models for zero-shot cross-lingual knowledge transfer in generation 17 hours ago | arxiv.org

abstract arxiv cross-lingual cs.cl +17

SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation 17 hours ago | arxiv.org

abstract algorithm algorithms arxiv +19

C-Pack: Packaged Resources To Advance General Chinese Embedding 17 hours ago | arxiv.org

advance arxiv chinese cs.ai +6

$\rm SP^3$: Enhancing Structured Pruning via PCA Projection 17 hours ago | arxiv.org

abstract arxiv cs.ai cs.cl +12

Matching Patients to Clinical Trials with Large Language Models 17 hours ago | arxiv.org

abstract arxiv challenge clinical +19

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Lead Data Engineer

@ JPMorgan Chase & Co. | Jersey City, NJ, United States

View on ai-jobs.net

Senior Machine Learning Engineer

@ TELUS | Vancouver, BC, CA

View on ai-jobs.net

CT Technologist - Ambulatory Imaging - PRN

@ Duke University | Morriville, NC, US, 27560

View on ai-jobs.net

BH Data Analyst

@ City of Philadelphia | Philadelphia, PA, United States

View on ai-jobs.net