Web: https://www.reddit.com/r/LanguageTechnology/comments/sbpmnj/is_there_no_standard_traindevtest_split_for_the/

Jan. 24, 2022, 4:10 p.m. | /u/squirreltalk

Natural Language Processing reddit.com

QQP is a dataset of duplicate and non-duplicate question pairs from Quora. I think it was originally developed as part of a Kaggle competition:

https://www.kaggle.com/c/quora-question-pairs

The competition released a training set with labels, and a test set without labels. QQP has subsequently been used in many papers developing new architectures for document similarity and duplicate detection, but unfortunately, as far as I can tell, there is no standard train/dev/test split of the dataset that has labels for each of train, …

dataset dev languagetechnology standard test

Research Scientist, 3D Reconstruction

@ Yembo | Remote, US

Clinical Assistant or Associate Professor of Management Science and Systems

@ University at Buffalo | Buffalo, NY

Data Analyst

@ Colorado Springs Police Department | Colorado Springs, CO

Predictive Ecology Postdoctoral Fellow

@ Lawrence Berkeley National Lab | Berkeley, CA

Data Analyst, Patagonia Action Works

@ Patagonia | Remote

Data & Insights Strategy & Innovation General Manager

@ Chevron Services Company, a division of Chevron U.S.A Inc. | Houston, TX