Feb. 13, 2024, 1:54 p.m. | Abhijith C

Towards AI - Medium pub.towardsai.net

Optimize Spark plans using deterministic and non-deterministic UDFs

Photo by Samuel Sianipar on Unsplash

Originally published on my blog.

When processing big data, efficiency is key. It’s not uncommon to be caught up in long debugging cycles when working with Spark. I was recently caught in such a debugging train when one of my pipelines was taking longer than expected. It was a simple structured streaming pipeline that was listening to a Kafka topic for events and performing some …

big big data blog data debugging efficiency key machine learning mlops optimization pipelines processing pyspark spark train

Research Scholar (Technical Research)

@ Centre for the Governance of AI | Hybrid; Oxford, UK

HPC Engineer (x/f/m) - DACH

@ Meshcapade GmbH | Remote, Germany

Senior Analyst-Data Analysis

@ Tesco Bengaluru | Bengaluru, India

Data Engineer - Senior Associate

@ PwC | Brussels

People Data Analyst

@ Version 1 | London, United Kingdom

Senior Data Scientist

@ Palta | Simple Cyprus or remote