July 4, 2022, 10:32 p.m. | /u/xanga_ghost

Data Science www.reddit.com

i need someone with more grooves in their brain to help me out here.

per the title, has anyone found it preferable to sample from a large dataset instead of training from a dask dataframe comprised of all the data?

and if so, have you found there to be a heavy tradeoff in model quality? my main gripe with dask is that the .compute() methods seem to take forever.

i am still very green to it all, so apologies if …

dask datascience datasets large datasets sampling

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne