March 22, 2024, 1:23 a.m. | /u/Whole-Watch-7980

Machine Learning www.reddit.com

What databases for quick query and storage on large datasets?

I have a 5 million row csv file that is about 1 gb of text. I also have 4 other csv files about the same size that I need to eventually combine together. However, the csv files I read into pandas are slow to read in and process. What are some database options that you would use for machine learning projects on this dataset?

I basically have 20 million rows …

csv data databases datasets eventually file files however large datasets machinelearning query storage text together

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne