Aug. 6, 2023, 6:15 p.m. | /u/jesseparks13

Data Science www.reddit.com

I am really confused about what types of data exploration and inspection you are allowed to do on the whole dataset BEFORE setting aside the test set. In the end-to-end machine learning project demonstrated in Hands-On Machine Learning by Geron Aurelien, the author checks the following before setting aside a test set: 1) quantities of data points and null values, 2) value counts of each value in categorical columns, 3) statistical summary of numerical columns, including counts, mean, std, min, …

author checks data data exploration datascience dataset exploration machine machine learning project set test types

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Data Engineer

@ Quantexa | Sydney, New South Wales, Australia

Staff Analytics Engineer

@ Warner Bros. Discovery | NY New York 230 Park Avenue South