March 28, 2024, 4:42 a.m. | Xiang Lisa Li, Urvashi Khandelwal, Kelvin Guu

cs.LG updates on arXiv.org arxiv.org

arXiv:2403.18286v1 Announce Type: cross
Abstract: Recent work has uncovered promising ways to extract well-calibrated confidence estimates from language models (LMs), where the model's confidence score reflects how likely it is to be correct. However, while LMs may appear well-calibrated over broad distributions, this often hides significant miscalibration within narrower slices (e.g., systemic over-confidence in math can balance out systemic under-confidence in history, yielding perfect calibration in aggregate). To attain well-calibrated confidence estimates for any slice of a distribution, we propose …

abstract arxiv confidence cs.ai cs.cl cs.lg extract few-shot however language language models lms math type work

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Data Science Analyst

@ Mayo Clinic | AZ, United States

Sr. Data Scientist (Network Engineering)

@ SpaceX | Redmond, WA