April 17, 2023, 3:53 p.m. | /u/dask-jeeves

Machine Learning www.reddit.com

The dask release `2023.2.1` , introduced a new shuffling method called P2P for `dask.dataframe`, making sorts, merges, and joins faster and using constant memory. This article describes the problem, the new solution, and the impact on performance.


[https://medium.com/coiled-hq/shuffling-large-data-at-constant-memory-in-dask-bb683e92d70b](https://medium.com/coiled-hq/shuffling-large-data-at-constant-memory-in-dask-bb683e92d70b)

article dask data dataframe faster impact machinelearning making memory p2p performance release solution

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

#13721 - Data Engineer - AI Model Testing

@ Qualitest | Miami, Florida, United States

Elasticsearch Administrator

@ ManTech | 201BF - Customer Site, Chantilly, VA