Dec. 9, 2023, 7:14 a.m. | Matt Collins

Towards Data Science - Medium towardsdatascience.com

Leverage some simple equations to generate related columns in test tables.

Image generated with DALL-E 3

I’ve recently been playing around with Databricks Labs Data Generator to create completely synthetic datasets from scratch. As part of this, I’ve looked at building sales data around different stores, employees, and customers. As such, I wanted to create relationships between the columns I was artificially populating — such as mapping employees and customers to a certain store.

Through using PySpark UDFs and a …

databricks data engineering data modeling pyspark python

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Data Engineer - Takealot Group (Takealot.com | Superbalist.com | Mr D Food)

@ takealot.com | Cape Town