Oct. 15, 2023, 5:51 p.m. | /u/Ambitious-Pay6329

Machine Learning www.reddit.com

What is the correct pipeline for data processing when conducting time series forecasting? Should we begin with data normalization/standardization, followed by feature selection, and then split the data into training, validation, and test sets? Or is it advisable to initially split the data to prevent spill-over effects?

I'm concerned about the possibility of training my model on (part of) the test data, which could result in spill-over effects. However, if the recommended approach is to split the data first and …

data data normalization data processing effects feature feature selection forecasting machinelearning normalization pipeline processing series standardization test time series time series forecasting training validation

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Robotics Technician - 3rd Shift

@ GXO Logistics | Perris, CA, US, 92571