June 20, 2022, 2:30 p.m. | Pan Cretan

Towards Data Science - Medium towardsdatascience.com

A demonstration using binomial and multinomial logistic regression in PySpark

Photo by David Jusko on Unsplash

With the release of Spark 3.2.1, that has been locally deployed for this article, PySpark offers a fluent API that resembles the expressivity of scikit-learn but additionally offers the benefits of distributed computing. This article demonstrates the use of the pyspark.ml module for constructing ML pipelines on top of Spark data frames (instead of RDDs with the older pyspark.mllib module). The functionality is exemplified …

learning logistic regression machine machine learning pyspark scale

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Engineer

@ Parker | New York City

Sr. Data Analyst | Home Solutions

@ Three Ships | Raleigh or Charlotte, NC