June 23, 2022, 4:08 p.m. | Eduardo Blancas

Towards Data Science - Medium towardsdatascience.com

Software Engineering For Data Science

A step-by-step guide to going from a messy notebook to a pipeline running in Kubernetes

Photo by Myriam Jessier on Unsplash

Notebooks are great for rapid iterations and prototyping but quickly get messy. After working on a notebook, my code becomes difficult to manage and unsuitable for deployment. In production, code organization is essential for maintainability (it’s much easier to improve and debug organized code than a long, messy notebook).

In this post, I’ll describe …

data science jupyter jupyter-notebook kubernetes machine learning notebooks open source refactoring tools

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Tableau/PowerBI Developer (A.Con)

@ KPMG India | Bengaluru, Karnataka, India

Software Engineer, Backend - Data Platform (Big Data Infra)

@ Benchling | San Francisco, CA