different splits yield a very different result | allainews.com

April 21, 2024, 10:18 a.m. | /u/msaoudallah

Data Science www.reddit.com

Hello all,

i have a problem where i have to predict a class for each line in a pdf , my data set consists of lines from different pdf files, when i shuffle the dataset and split with random lines in train, test sets i got a high score >0.96 , but when i group the dataset by document, and take some document for training and others for testing and i get a very poor score <0.9
what do you …

class data datascience data set dataset files hello line pdf random set split test train

More from www.reddit.com / Data Science

Imposter Colleagues Taking My Work 8 hours ago | www.reddit.com

analysts analytics colleagues datascience +10

Rshiny is dog shit 11 hours ago | www.reddit.com

code datascience debug debugging +5

When the word is all about LLMs and GenAI and you are still using linear … 12 hours ago | www.reddit.com

algorithm basic current datascience +11

suggestions for a new DS team leader 15 hours ago | www.reddit.com

boss dashboards datascience leader +7

Are any companies good at onboarding data people? Are there any effective data analytics bosses/leaders? 16 hours ago | www.reddit.com

analytics boss bosses companies +11

What's the most important technical skill for an ML Engineer? 17 hours ago | www.reddit.com

datascience engineer ml engineer skill +1

What is Spark demand currently? 1 day, 2 hours ago | www.reddit.com

databricks datascience demand language +7

Multivariate multi-output time series forecasting 1 day, 16 hours ago | www.reddit.com

car confidence datascience forecast +14

What field or scope are you working on and how often is there a "regime … 1 day, 19 hours ago | www.reddit.com

change datascience mean model retraining +3

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net