March 8, 2024, 5:30 a.m. | Aparna Dhinakaran

Towards Data Science - Medium towardsdatascience.com

Image created by author using Dall-E 3

Testing major LLMs on how well they conduct numeric evaluations

In addition to generating text for a growing number of industry applications, LLMs are now widely being used as evaluation tools. Models quantify the relevance of retrieved documents in retrieval systems, gauge the sentiment of comments and posts, and more — evaluating both human and AI-generated text. These evaluations are often either numeric or categorical.

Different types of LLM evals (diagram by author) …

applications author dall dall-e documents evals evaluation evaluation-metric image industry judge llm llm-evaluation llmops llms major mixtral 8x7b retrieval systems text thoughts-and-theory tools

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

DevOps Engineer (Data Team)

@ Reward Gateway | Sofia/Plovdiv