Sept. 1, 2023, 8:09 a.m. | /u/heeeehuuuu

Machine Learning www.reddit.com

The dataset I have consists of around 2,300 observations and 120 variables, of which around 25 are highly correlated, so I narrowed it down to 95 variables.

I'm using R's boost\_tree() with xgboost as my model.

How do I decide when to stop tuning for number of **trees, mtry, min\_n**, and tree depth, without actually overfitting the data? Because as I increase the number of trees (or any other variable like the ones above), my RMSE obviously goes down, but …

boost dataset how to decide hyperparameter machinelearning trees variables xgboost

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne