different splits yield a very different result | allainews.com

April 21, 2024, 10:18 a.m. | /u/msaoudallah

Data Science www.reddit.com

Hello all,

i have a problem where i have to predict a class for each line in a pdf , my data set consists of lines from different pdf files, when i shuffle the dataset and split with random lines in train, test sets i got a high score >0.96 , but when i group the dataset by document, and take some document for training and others for testing and i get a very poor score <0.9
what do you …

class data datascience data set dataset files hello line pdf random set split test train

More from www.reddit.com / Data Science

Tech layoffs cross 70,000 in April 2024: Google, Apple, Intel, Amazon, and these companies cut … 4 hours ago | www.reddit.com

amazon apple april companies +7

How much did your grad program help you get a job? 5 hours ago | www.reddit.com

big course datascience employers +7

What’s the DS job market like for people who have a decent amount of experience? 8 hours ago | www.reddit.com

datascience experience faang graduate +5

Put my foot down and refused to go ahead with what would amount to almost … 9 hours ago | www.reddit.com

call data datascience data scientist +2

How would you model this problem? 12 hours ago | www.reddit.com

churn count datascience features +4

What makes a good or bad product manager? 15 hours ago | www.reddit.com

datascience ever good love +5

REVOLUTIONIZING DRAG-AND-DROP PYTHON GUI BUILDING 15 hours ago | www.reddit.com

building datascience gui python

Apple silicone users: how do you make LLM’s run faster? 16 hours ago | www.reddit.com

apple build datascience faster +5

What are you excited about based on the career you've built so far and where … 1 day, 7 hours ago | www.reddit.com

career datascience fun knowledge +2

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Lead Data Modeler

@ Sherwin-Williams | Cleveland, OH, United States

View on ai-jobs.net