Dec. 23, 2023, 3:38 p.m. | Sarthak Sarbahi

Towards Data Science - Medium towardsdatascience.com

Photo by Ian Taylor on Unsplash

This tutorial guides you through an analytics use case, analyzing semi-structured data with Spark SQL. We’ll start with the data engineering process, pulling data from an API and finally loading the transformed data into a data lake (represented by MinIO). Plus, we'll utilise Docker to introduce a best practice for setting up the environment. So, let’s dive in and see how it’s all done!

Table of contents

  1. Understanding the building blocks
  2. Setting up …

apache spark data analysis data engineering docker sql

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne