Can/should I do stratified sampling for two variables when I split training/testing? | allainews.com

May 25, 2023, 3:50 p.m. | /u/NDVGuy

Data Science www.reddit.com

Hi all, I hope you don't mind a bit of a novice question. I'm working with a dataset where there are large imbalances in two features with a strong relationship to the target (the year and location that data was collected). In the Hands-On ML O'Reilly textbook, it mentions that you should consider stratified sampling for the train/test split in situations with highly impactful imbalanced features.

How would you handle this scenario, where two features are highly impactful and imbalanced? …

data datascience dataset features location mind o'reilly relationship sampling testing training variables

More from www.reddit.com / Data Science

Datasets for Causal ML 2 hours ago | www.reddit.com

advertising causal causal inference context +11

“What motivates you?” What’s the best answer besides compensation? 4 hours ago | www.reddit.com

applications challenges clear compensation +8

Got the offer - where should I go? 23 hours ago | www.reddit.com

analyst bank crypto datascience +8

What is the difference between a data scientist and a data analyst role? 1 day, 3 hours ago | www.reddit.com

analyst call data data analyst +4

Hired as a “Sr. Data Science Analyst”, but not doing any DS 1 day, 4 hours ago | www.reddit.com

analyst data data science datascience +6

What (online) courses/program should I take to become a ML engineer? 1 day, 5 hours ago | www.reddit.com

become consultant courses datascience +10

May Philly Data & AI Happy Hour ✨ 1 day, 7 hours ago | www.reddit.com

data datascience hour join

Suggest on food ingredients dataset 1 day, 13 hours ago | www.reddit.com

advice column data datascience +6

Difference between MLE , Data Scientist and Data Engineer 1 day, 15 hours ago | www.reddit.com

data data engineer data engineers datascience +12

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Vice President, AI Product Manager

@ JPMorgan Chase & Co. | New York City, United States

View on ai-jobs.net

Binance Accelerator Program - Data Engineer

@ Binance | Asia

View on ai-jobs.net