April 18, 2024, 2:07 p.m. | /u/Melodic_Reality_646

Machine Learning www.reddit.com

Each text has no more than 15 words, and the classes are highly imbalanced. But they all have at least 30 or so instances.

I was successful with data of the same nature but with around 15 labels with an ensemble of gradient boosting models.

Before diving into testing a bunch models I wondered if there’s some strategy to tackle high-dimensional problems like this one.

Some problems are just not solvable, let’s face it. But what would you guys try?

boosting classification data ensemble gradient instances labels least machinelearning nature testing text transformers words

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Software Engineer, Data Tools - Full Stack

@ DoorDash | Pune, India

Senior Data Analyst

@ Artsy | New York City