April 12, 2024, 5:09 a.m. | Ciro Greco

Towards Data Science - Medium towardsdatascience.com

An open source implementation of WAP using Apache Iceberg, Lambdas, and Project Nessie all running entirely Python

Look Ma: no JVM! Photo by Zac Ong on Unsplash

Introduction

In this blog post we provide a no-nonsense, reference implementation for Write-Audit-Publish (WAP) patterns on a data lake, using Apache Iceberg as an open table format, and Project Nessie as a data catalog supporting git-like semantics.

We chose Nessie because its branching capabilities provide a good abstraction to implement a WAP design. …

apache apache iceberg audit blog data data lake data lakehouse data lakes format iceberg implementation lake open source open table format patterns photo project python reference running table table format

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Software Engineer, Machine Learning (Tel Aviv)

@ Meta | Tel Aviv, Israel

Senior Data Scientist- Digital Government

@ Oracle | CASABLANCA, Morocco