Oct. 4, 2022, 2:07 p.m. | Kevin Kho

Towards Data Science - Medium towardsdatascience.com

Profiling large-scale data for use cases such as anomaly detection, drift detection, and data validation

Husky with a magnifying glass — Image by Author

Motivation

Data pipelines have the potential to produce unexpected results in a variety of ways. Anomalous data can cause data to be scaled incorrectly. Machine learning model drift can lead to reduced prediction accuracy. Failures in the upstream collection could cause null values as the data pipeline executes. How do we safeguard against these failure cases? …

big data dask data data profiling data science fugue logging pandas profiling ray scale spark

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Data Engineer

@ Quantexa | Sydney, New South Wales, Australia

Staff Analytics Engineer

@ Warner Bros. Discovery | NY New York 230 Park Avenue South