March 26, 2024, 1:33 a.m. | /u/databot_

Data Science www.reddit.com

Hi r/datascience!



https://preview.redd.it/pp9vll5o0lqc1.png?width=800&format=png&auto=webp&s=e3639b7a1e01e98e854b152f93e32b7c410ca608

Over the last decade, I've participated in a dozen data science projects in industry. When projects hit production, it's critical to have unit and integration tests to prevent pushing faulty features or models. I've [summarized my learnings in a blog post](https://ploomber.io/blog/ci-for-ds/), here's the summary:



1. Structure your pipeline in several tasks, each one saving intermediate results to disk
2. Implement your pipeline in such a way that you can parametrize it
3. The first parameter …

change continuous data data science datascience integration intermediate location pipeline raw results sample saving science tasks testing

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Field Sample Specialist (Air Sampling) - Eurofins Environment Testing – Pueblo, CO

@ Eurofins | Pueblo, CO, United States

Camera Perception Engineer

@ Meta | Sunnyvale, CA