Sept. 12, 2022, 2:35 p.m. | Jean-Claude Cote

Towards Data Science - Medium towardsdatascience.com

How to index hundreds of terabytes of malware using Apache Spark and Iceberg tables

Photo by Hes Mundt on Unsplash

In this article, we will show how we used Spark and Iceberg tables to implement a malware index similar to UrsaDB and integrated this index into Mquery an analyst-friendly web GUI to submit YARA rules and display results.

This proof of concept was developed during GeekWeek an annual workshop organized by the Canadian Centre for Cyber Security and bring together …

big data cloud cloud computing editors pick jupyter-notebook malware scale spark technologies

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Lead Data Modeler

@ Sherwin-Williams | Cleveland, OH, United States