all AI news
Data preparation, data cleaning, and how it affects data leakage.
Aug. 24, 2022, 4:44 a.m. | /u/The-Fourth-Hokage
Data Science www.reddit.com
I have seen approaches for data preparation and data cleaning before and after using train-test split in scikit learn, and I’m very confused about which approach is the correct way for the following:
-Working with missing values
-Working with categorical data
-Feature selection
-Feature engineering
-Removing data
-Exploratory data analysis
Can someone provide some explanation for each of these tasks, and please clarify if these should be done before or after train-test split?
Thank you very much!
data data cleaning data leakage data preparation datascience
More from www.reddit.com / Data Science
Lots of free time over the summer, how should I spend it?
1 day, 2 hours ago |
www.reddit.com
Interview experience: AI Engineer, entry/mid level
1 day, 3 hours ago |
www.reddit.com
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Developer AI Senior Staff Engineer, Machine Learning
@ Google | Sunnyvale, CA, USA; New York City, USA
Engineer* Cloud & Data Operations (f/m/d)
@ SICK Sensor Intelligence | Waldkirch (bei Freiburg), DE, 79183