April 19, 2022, 3:31 p.m. | Eduardo Blancas

Towards Data Science - Medium towardsdatascience.com

Software Engineering for Data Science

Best practices for organizing your data analysis logic

When helping members of our community with architectural decisions regarding their data pipelines, a recurring question is how to group the analysis logic into tasks. This blog post summarizes the advice we’ve given to our community members for writing clean data pipelines. Our objective is to prevent an anti-pattern we’ve seen, where pipelines contain gigantic tasks:

Monolithic pipeline. Image by author.

These monolithic pipelines have many problems: …

data data engineering data pipelines data science jupyter-notebook machine learning open source pipelines writing

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Machine Learning Engineer

@ Samsara | Canada - Remote