May 15, 2022, 10:54 p.m. | /u/ptrenko123

Data Science www.reddit.com

For a lot of clients, I see issues with some categories of data being severely underrepresented in the training data.

For instance, a category could just have 10 examples in a total dataset size of 10,000. In such, cases do you downsample, generate synthetic examples etc?

What would be the ideal solution?

data datascience ner text

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

Enterprise Data Architect

@ Pathward | Remote

Diagnostic Imaging Information Systems (DIIS) Technologist

@ Nova Scotia Health Authority | Halifax, NS, CA, B3K 6R8

Intern Data Scientist - Residual Value Risk Management (f/m/d)

@ BMW Group | Munich, DE

Analytics Engineering Manager

@ PlayStation Global | United Kingdom, London

Junior Insight Analyst (PR&Comms)

@ Signal AI | Lisbon, Lisbon, Portugal