Feb. 19, 2024, midnight |

Nicholas Carlini nicholas.carlini.com

A benchmark of ~100 tests for language models, collected from actual questions I've asked of language models in the last year.

benchmark language language models large language large language models questions tests

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior ML Engineer

@ Carousell Group | Ho Chi Minh City, Vietnam

Data and Insight Analyst

@ Cotiviti | Remote, United States