May 31, 2023, 7:50 p.m. | /u/qhelspil

Data Science www.reddit.com

perhaps the term outliers isnt appropriate here

what i did is calculated the shap value for each feature, then :

\- i took the feature with highest importance

\- i took the index of each value with negative shap in this feature

\- i recreated dataset with removing these indexes

since a negative shap means a datapoint is contributing negatively to the model, i removed it

my f1score increased from 70% to 90 % ( the data was imbalanced ) …

datascience dataset feature importance index negative outliers shap value values

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote