May 31, 2023, 7:50 p.m. | /u/qhelspil

Data Science www.reddit.com

perhaps the term outliers isnt appropriate here

what i did is calculated the shap value for each feature, then :

\- i took the feature with highest importance

\- i took the index of each value with negative shap in this feature

\- i recreated dataset with removing these indexes

since a negative shap means a datapoint is contributing negatively to the model, i removed it

my f1score increased from 70% to 90 % ( the data was imbalanced ) …

datascience dataset feature importance index negative outliers shap value values

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

@ HelloBetter | Remote

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena

Seeking Developers and Engineers for AI T-Shirt Generator Project

@ Chevon Hicks | Remote

Senior Applied Data Scientist

@ dunnhumby | London

Principal Data Architect - Azure & Big Data

@ MGM Resorts International | Home Office - US, NV