Jan. 25, 2024, 7:21 a.m. | Jean-Claude Cote

Towards Data Science - Medium towardsdatascience.com

A Practical guide to optimizing non-equi joins in Spark

Photo by John Lee on Unsplash

Enriching network events with IP geolocation information is a crucial task, especially for organizations like the Canadian Centre for Cyber Security, the national CSIRT of Canada. In this article, we will demonstrate how to optimize Spark SQL joins, specifically focusing on scenarios involving non-equality conditions — a common challenge when working with IP geolocation data.

As cybersecurity practitioners, our reliance on enriching network events …

article canada centre cyber cyber security cybersecurity data engineering events geolocation guide hands-on-tutorials information ipv4 john joins lee network organizations practical security spark sql will

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Data Engineer

@ Cint | Gurgaon, India

Data Science (M/F), setor automóvel - Aveiro

@ Segula Technologies | Aveiro, Portugal