May 30, 2023, 4 p.m. | /u/jonas__m

Machine Learning www.reddit.com

Hey Redditors!

Before modeling a dataset, do you remember to check if it seems IID?

[The non-IID data on the right were collected in such a way that violates the Independent and Identically Distributed \(IID\) assumption.](https://preview.redd.it/gc1i3amf213b1.png?width=872&format=png&auto=webp&v=enabled&s=5b7ab548e3845f7ec17178bfc6b39a85247cb778)

Distribution drift and interactions between datapoints (autocorrelation) are common violations of the Independent and Identically Distributed (IID) assumption which make data-driven inference **untrustworthy**.

I present an automated check for such IID violations that you can quickly run on any {numeric, image, text, audio, etc.} …

automated check checks data data-driven dataset distributed distribution drift hey independent inference interactions machinelearning modeling

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Analyst

@ SEAKR Engineering | Englewood, CO, United States

Data Analyst II

@ Postman | Bengaluru, India

Data Architect

@ FORSEVEN | Warwick, GB

Director, Data Science

@ Visa | Washington, DC, United States

Senior Manager, Data Science - Emerging ML

@ Capital One | McLean, VA