HELM lite: Lightweight and Broad Capabilities Evaluation | allainews.com

c

Dec. 19, 2023, midnight | Percy Liang

stanford-crfm-website.github.io crfm.stanford.edu

It seems hard to believe that Holistic Evaluation of Language Models (HELM) was released only a year ago: November 2022 — ChatGPT had not even come out yet. The original goal of HELM was to holistically evaluate all the language models we had access to on a set of representative scenarios (capturing language abilities, reasoning abilities, knowledge, etc.) and multiple metrics (accuracy, calibration, robustness, fairness, bias, toxicity, efficiency). As a result, we ended up with something that was conceptually elegant, …

capabilities chatgpt evaluation helm language language models set

More from crfm.stanford.edu / stanford-crfm-website.github.io

st

Massive Multitask Language Understanding (MMLU) on HELM 4 days, 8 hours ago | crfm.stanford.edu

helm language language understanding massive +2

st

Acceptable Use Policies for Foundation Models 3 weeks, 6 days ago | crfm.stanford.edu

developers foundation policies

st

HELM Instruct: A Multidimensional Instruction Following Evaluation Framework with Absolute Ratings 2 months, 2 weeks ago | crfm.stanford.edu

evaluation framework helm multidimensional +1

st

Plans for v1.1 of the Foundation Model Transparency Index: Self-Assessment 2 months, 3 weeks ago | crfm.stanford.edu

allen assessment board concrete +8

st

HELM lite: Lightweight and Broad Capabilities Evaluation 4 months, 2 weeks ago | crfm.stanford.edu

capabilities chatgpt evaluation helm +3

st

Towards compromise: A concrete two-tier proposal for foundation models in the EU AI Act 5 months ago | crfm.stanford.edu

st

Drawing Lines: Tiers for Foundation Models 5 months, 2 weeks ago | crfm.stanford.edu

st

Flash-Decoding for long-context inference 6 months, 3 weeks ago | crfm.stanford.edu

context decoding flash inference

st

Observations from HALIE: A Closer Look at Human-LM Interactions in Information-Seeking Contexts 6 months, 3 weeks ago | crfm.stanford.edu

closer look human information interactions +1

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net