June 24, 2023, 10:34 p.m. | /u/name4reddit13

Data Science www.reddit.com

I built several models with a lot of trial and error on an imbalanced dataset (93% - 7%) and in the end selected the best performing according to the condition "More than 50% TN, max possible TP".

Model configuration:

* Algorithm: SVM
* Missing values thr.: 70%
* variance thr.: 0.05
* correlation thr.: 80%
* outlier action: substitute-4s
* scaling method: standard
* imputation method: knn
* feature selection: none
* imbalance: undersampling

The model ended up with 150 …

algorithm correlation datascience dataset error feedback max missing values outlier pipeline svm values variance

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne