Jan. 18, 2024, 12:23 p.m. | Chloe Caron

DEV Community dev.to

Bad data quality can arise in any type of data, be it numerical, textual or other. As we saw in the last article of this series, LLMs like OpenAI are quite effective at detecting anomalies in textual data. However, the OpenAI anomaly detector really struggled with numerical data, reaching an accuracy of 68% even after applying multiple forms of prompt engineering (compared to ~100% for textual data).


This is to be expected since OpenAI is a large language model and …

accuracy anomaly article bigquery build data dataengineering data quality datascience llms numerical openai quality series textual type

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Data Science Analyst

@ Mayo Clinic | AZ, United States

Sr. Data Scientist (Network Engineering)

@ SpaceX | Redmond, WA