Jan. 24, 2022

Natural Language Processing

QQP is a dataset of duplicate and non-duplicate question pairs from Quora. I think it was originally developed as part of a Kaggle competition:


The competition released a training set with labels, and a test set without labels. QQP has subsequently been used in many papers developing new architectures for document similarity and duplicate detection, but unfortunately, as far as I can tell, there is no standard train/dev/test split of the dataset that has labels for each of train, …

