Nov. 15, 2023, 2:08 p.m. | /u/rotterdamn8

Data Science www.reddit.com

Hi. I'm migrating SAS code to Databricks, and one thing that I need to reproduce is summary statistics, especially frequency distributions. For example "proc freq" and univariate functions in SAS.

I calculated the frequency distribution manually, but it would be helpful if there was a function to give you that and more. I'm searching but not seeing much.

Is there a particular Pyspark library I should be looking at? Thanks.

beyond code databricks datascience distribution example function functions pyspark sas statistics summary

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Data Engineer (m/f/d)

@ Project A Ventures | Berlin, Germany

Principle Research Scientist

@ Analog Devices | US, MA, Boston