all AI news
Is there no standard train/dev/test split for the Quora Question Pairs dataset of duplicate questions, *with labels for all splits*?
Web: https://www.reddit.com/r/LanguageTechnology/comments/sbpmnj/is_there_no_standard_traindevtest_split_for_the/
Jan. 24, 2022, 4:10 p.m. | /u/squirreltalk
Natural Language Processing reddit.com
QQP is a dataset of duplicate and non-duplicate question pairs from Quora. I think it was originally developed as part of a Kaggle competition:
https://www.kaggle.com/c/quora-question-pairs
The competition released a training set with labels, and a test set without labels. QQP has subsequently been used in many papers developing new architectures for document similarity and duplicate detection, but unfortunately, as far as I can tell, there is no standard train/dev/test split of the dataset that has labels for each of train, …
!-->More from reddit.com / Natural Language Processing
Latest AI/ML/Big Data Jobs
Research Scientist, 3D Reconstruction
@ Yembo | Remote, US
Clinical Assistant or Associate Professor of Management Science and Systems
@ University at Buffalo | Buffalo, NY
Data Analyst
@ Colorado Springs Police Department | Colorado Springs, CO
Predictive Ecology Postdoctoral Fellow
@ Lawrence Berkeley National Lab | Berkeley, CA
Data Analyst, Patagonia Action Works
@ Patagonia | Remote
Data & Insights Strategy & Innovation General Manager
@ Chevron Services Company, a division of Chevron U.S.A Inc. | Houston, TX