April 23, 2024, 4:43 a.m. | Jacopo Tagliabue, Ciro Greco

cs.LG updates on arXiv.org arxiv.org

arXiv:2404.13682v1 Announce Type: cross
Abstract: As the Lakehouse architecture becomes more widespread, ensuring the reproducibility of data workloads over data lakes emerges as a crucial concern for data engineers. However, achieving reproducibility remains challenging. The size of data pipelines contributes to slow testing and iterations, while the intertwining of business logic and data management complicates debugging and increases error susceptibility. In this paper, we highlight recent advancements made at Bauplan in addressing this challenge. We introduce a system designed to …

abstract architecture arxiv cs.db cs.lg data data engineers data lakes data pipelines data science data workloads engineers however lakehouse lakehouse architecture pipelines reproducibility science testing type workloads

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Machine Learning Engineer

@ Apple | Sunnyvale, California, United States