all AI news
different splits yield a very different result
April 21, 2024, 10:18 a.m. | /u/msaoudallah
Data Science www.reddit.com
i have a problem where i have to predict a class for each line in a pdf , my data set consists of lines from different pdf files, when i shuffle the dataset and split with random lines in train, test sets i got a high score >0.96 , but when i group the dataset by document, and take some document for training and others for testing and i get a very poor score <0.9
what do you …
class data datascience data set dataset files hello line pdf random set split test train
More from www.reddit.com / Data Science
suggestions for a new DS team leader
15 hours ago |
www.reddit.com
What is Spark demand currently?
1 day, 2 hours ago |
www.reddit.com
Multivariate multi-output time series forecasting
1 day, 16 hours ago |
www.reddit.com
Jobs in AI, ML, Big Data
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US
Research Engineer
@ Allora Labs | Remote
Ecosystem Manager
@ Allora Labs | Remote
Founding AI Engineer, Agents
@ Occam AI | New York