all AI news
Large Scale Data Profiling with whylogs and Fugue on Spark, Ray or Dask
Oct. 4, 2022, 2:07 p.m. | Kevin Kho
Towards Data Science - Medium towardsdatascience.com
Profiling large-scale data for use cases such as anomaly detection, drift detection, and data validation
Husky with a magnifying glass — Image by AuthorMotivation
Data pipelines have the potential to produce unexpected results in a variety of ways. Anomalous data can cause data to be scaled incorrectly. Machine learning model drift can lead to reduced prediction accuracy. Failures in the upstream collection could cause null values as the data pipeline executes. How do we safeguard against these failure cases? …
big data dask data data profiling data science fugue logging pandas profiling ray scale spark
More from towardsdatascience.com / Towards Data Science - Medium
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Senior Data Engineer
@ Quantexa | Sydney, New South Wales, Australia
Staff Analytics Engineer
@ Warner Bros. Discovery | NY New York 230 Park Avenue South