March 24, 2024, 2:09 p.m. | Tula Masterman

Towards Data Science - Medium towardsdatascience.com

Evaluating the evolution and application of language models on real world tasks

AI students taking an exam in a classroom. Image created by author and DALL-E 3.

In the realm of education, the best exams are those that challenge students to apply what they’ve learned in new and unpredictable ways, moving beyond memorizing facts to demonstrate true understanding. Our evaluations of language models should follow the same pattern. As we see new models flood the AI space everyday whether from …

ai-agent ai research application apply author benchmark beyond challenge classroom dall dall-e dall-e 3 education evolution exam exams generative-ai image language language models llm llm benchmarks moving students world

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Research Scientist, Demography and Survey Science, University Grad

@ Meta | Menlo Park, CA | New York City

Computer Vision Engineer, XR

@ Meta | Burlingame, CA