March 22, 2024, 5:14 p.m. | Simon Grah

Towards Data Science - Medium towardsdatascience.com

This article is the second in a series about cleaning data using Large Language Models (LLMs), with a focus on identifying errors in tabular data sets.

The sketch outlines the methodology we’ll explore in this article, which focuses on evaluating the Data Dirtiness Score of a tabular data set with minimal human involvement.

The Data Dirtiness Score

Readers are encouraged to first review the introductory article on the Data Dirtiness Score, which explains the key assumptions and demonstrates how …

article automated cleaning data data cleaning data quality data quality issues data science data set data sets deep-dives detection errors explore focus human human involvement language language models large language large language models llm llms methodology outlines quality series set tabular tabular data

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US