June 4, 2022, 11:23 p.m. | /u/unixmint

Data Science www.reddit.com

How am I supposed to figure out feature importance for a churn model if a lot of my independent variables are highly correlated?

For example weekly active users vs power users [these are just highly engaged users] vs. viral users [these are users that share our product]

VIF is screaming at me saying 50% of my features are way above the “rule of thumb” 10.

Correlation matrix is also showing >.8 on a lot of features.

But I’m still trying …

datascience feature importance multicollinearity

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Management Assistant

@ World Vision | Amman Office, Jordan

Cloud Data Engineer, Global Services Delivery, Google Cloud

@ Google | Buenos Aires, Argentina