Nov. 15, 2023, 2:08 p.m. | /u/rotterdamn8

Data Science www.reddit.com

Hi. I'm migrating SAS code to Databricks, and one thing that I need to reproduce is summary statistics, especially frequency distributions. For example "proc freq" and univariate functions in SAS.

I calculated the frequency distribution manually, but it would be helpful if there was a function to give you that and more. I'm searching but not seeing much.

Is there a particular Pyspark library I should be looking at? Thanks.

beyond code databricks datascience distribution example function functions pyspark sas statistics summary

Data Engineer

@ Cepal Hellas Financial Services S.A. | Athens, Sterea Ellada, Greece

Senior Manager Data Engineering

@ Publicis Groupe | Bengaluru, India

Senior Data Modeler

@ Sanofi | Hyderabad

VP, Product Management - Data, AI & ML

@ Datasite | USA - MN - Minneapolis

Supervisão de Business Intelligence (BI)

@ Publicis Groupe | São Paulo, Brazil

Data Manager Advertising (f|m|d) (80-100%) - Zurich - Hybrid Work

@ SMG Swiss Marketplace Group | Zürich, Switzerland