HELM lite: Lightweight and Broad Capabilities Evaluation | allainews.com

c

Dec. 19, 2023, midnight | Percy Liang

stanford-crfm-website.github.io crfm.stanford.edu

It seems hard to believe that Holistic Evaluation of Language Models (HELM) was released only a year ago: November 2022 — ChatGPT had not even come out yet. The original goal of HELM was to holistically evaluate all the language models we had access to on a set of representative scenarios (capturing language abilities, reasoning abilities, knowledge, etc.) and multiple metrics (accuracy, calibration, robustness, fairness, bias, toxicity, efficiency). As a result, we ended up with something that was conceptually elegant, …

capabilities chatgpt evaluation helm language language models set

More from crfm.stanford.edu / stanford-crfm-website.github.io

st

The First Steps to Holistic Evaluation of Vision-Language Models 1 week, 5 days ago | crfm.stanford.edu

evaluation language language models vision +2

st

Massive Multitask Language Understanding (MMLU) on HELM 2 weeks, 5 days ago | crfm.stanford.edu

helm language language understanding massive +2

st

Acceptable Use Policies for Foundation Models 1 month, 1 week ago | crfm.stanford.edu

developers foundation policies

st

HELM Instruct: A Multidimensional Instruction Following Evaluation Framework with Absolute Ratings 3 months ago | crfm.stanford.edu

evaluation framework helm multidimensional +1

st

Plans for v1.1 of the Foundation Model Transparency Index: Self-Assessment 3 months, 1 week ago | crfm.stanford.edu

allen assessment board concrete +8

st

HELM lite: Lightweight and Broad Capabilities Evaluation 5 months ago | crfm.stanford.edu

capabilities chatgpt evaluation helm +3

st

Towards compromise: A concrete two-tier proposal for foundation models in the EU AI Act 5 months, 2 weeks ago | crfm.stanford.edu

st

Drawing Lines: Tiers for Foundation Models 6 months ago | crfm.stanford.edu

st

Flash-Decoding for long-context inference 7 months, 1 week ago | crfm.stanford.edu

context decoding flash inference

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net