all AI news
Mitigating Redundant UDF Computations in Spark Plans
Optimize Spark plans using deterministic and non-deterministic UDFsPhoto by Samuel Sianipar on Unsplash
Originally published on my blog.
When processing big data, efficiency is key. It’s not uncommon to be caught up in long debugging cycles when working with Spark. I was recently caught in such a debugging train when one of my pipelines was taking longer than expected. It was a simple structured streaming pipeline that was listening to a Kafka topic for events and performing some …