Measuring Taiwanese Mandarin Language Understanding | allainews.com

April 1, 2024, 4:47 a.m. | Po-Heng Chen, Sijia Cheng, Wei-Lin Chen, Yen-Ting Lin, Yun-Nung Chen

cs.CL updates on arXiv.org arxiv.org

arXiv:2403.20180v1 Announce Type: new
Abstract: The evaluation of large language models (LLMs) has drawn substantial attention in the field recently. This work focuses on evaluating LLMs in a Chinese context, specifically, for Traditional Chinese which has been largely underrepresented in existing benchmarks. We present TMLU, a holistic evaluation suit tailored for assessing the advanced knowledge and reasoning capability in LLMs, under the context of Taiwanese Mandarin. TMLU consists of an array of 37 subjects across social science, STEM, humanities, Taiwan-specific …

abstract advanced arxiv attention benchmarks chinese context cs.cl evaluation knowledge language language models language understanding large language large language models llms measuring type understanding work

More from arxiv.org / cs.CL updates on arXiv.org

Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback 17 hours ago | arxiv.org

alignment arxiv cs.cl feedback +5

Can Language Model Moderators Improve the Health of Online Discourse? 17 hours ago | arxiv.org

abstract arxiv communities conversational +19

R-Tuning: Instructing Large Language Models to Say `I Don't Know' 17 hours ago | arxiv.org

arxiv cs.cl language language models +3

On-the-Fly Fusion of Large Language Models and Machine Translation 17 hours ago | arxiv.org

abstract arxiv cs.cl data +12

Can LLMs Grade Short-Answer Reading Comprehension Questions : An Empirical Study with a Novel Dataset 17 hours ago | arxiv.org

abstract arxiv assessment cs.ai +16

Making Retrieval-Augmented Language Models Robust to Irrelevant Context 17 hours ago | arxiv.org

abstract arxiv context cs.ai +14

RA-DIT: Retrieval-Augmented Dual Instruction Tuning 17 hours ago | arxiv.org

abstract arxiv build cs.ai +19

Bengali Fake Reviews: A Benchmark Dataset and Detection System 17 hours ago | arxiv.org

abstract arxiv benchmark businesses +16

How far is Language Model from 100% Few-shot Named Entity Recognition in Medical Domain 17 hours ago | arxiv.org

abstract arxiv capabilities cs.cl +14

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net