c
Dec. 19, 2023, midnight | Percy Liang

stanford-crfm-website.github.io crfm.stanford.edu

It seems hard to believe that Holistic Evaluation of Language Models (HELM) was released only a year ago: November 2022 — ChatGPT had not even come out yet.  The original goal of HELM was to holistically evaluate all the language models we had access to on a set of representative scenarios (capturing language abilities, reasoning abilities, knowledge, etc.) and multiple metrics (accuracy, calibration, robustness, fairness, bias, toxicity, efficiency).  As a result, we ended up with something that was conceptually elegant, …

capabilities chatgpt evaluation helm language language models set

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne