Web: https://www.reddit.com/r/MachineLearning/comments/s3wung/d_what_framework_do_you_do_your_data_processing/

Jan. 14, 2022, 4:48 p.m. | /u/iamquah

Machine Learning reddit.com

Hey all!

I'm curious to know what everyone does their data processing in? For industry purposes on smaller datasets I've used Pandas and sklearn, while for larger ones I've used Dask.

  • I know that Tensorflow and Pytorch have their own dataloader frameworks but does it scale to large datasets? Say 100GB++?

  • Do people do data transformations in Jax?

  • if you had infinite time how would you (re)do your company tech stack?

My data is 85% timeseries, 10% images, and 5% tabular data.

submitted by /u/iamquah
[link] [comments]

data data processing datasets framework machinelearning processing small

Statistics and Computer Science Specialist

@ Hawk-Research | Remote

Data Scientist, Credit/Fraud Strategy

@ Fora Financial | New York City

Postdoctoral Research Associate - Biomedical Natural Language Processing and Deep Learning

@ Oak Ridge National Laboratory - Oak Ridge, TN | Oak Ridge, TN, United States

Senior Machine Learning / Computer Vision Engineer

@ Glass Imaging | Los Altos, CA

Research Scientist in Biomedical Natural Language Processing and Deep Learning

@ Oak Ridge National Laboratory | Oak Ridge, TN

W3-Professorship for Intelligent Energy Management

@ Universität Bayreuth | Bayreuth, Germany