all AI news
Mitigating Redundant UDF Computations in Spark Plans
Feb. 13, 2024, 1:54 p.m. | Abhijith C
Towards AI - Medium pub.towardsai.net
Optimize Spark plans using deterministic and non-deterministic UDFs
Photo by Samuel Sianipar on UnsplashOriginally published on my blog.
When processing big data, efficiency is key. It’s not uncommon to be caught up in long debugging cycles when working with Spark. I was recently caught in such a debugging train when one of my pipelines was taking longer than expected. It was a simple structured streaming pipeline that was listening to a Kafka topic for events and performing some …
big big data blog data debugging efficiency key machine learning mlops optimization pipelines processing pyspark spark train
More from pub.towardsai.net / Towards AI - Medium
Jobs in AI, ML, Big Data
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Data Engineer (m/f/d)
@ Project A Ventures | Berlin, Germany
Principle Research Scientist
@ Analog Devices | US, MA, Boston