Feb. 13, 2024, 5:49 a.m. | Kexin Wang Nils Reimers Iryna Gurevych

cs.CL updates on arXiv.org arxiv.org

The work of neural retrieval so far focuses on ranking short texts and is challenged with long documents. There are many cases where the users want to find a relevant passage within a long document from a huge corpus, e.g. Wikipedia articles, research papers, etc. We propose and name this task \emph{Document-Aware Passage Retrieval} (DAPR). While analyzing the errors of the State-of-The-Art (SoTA) passage retrievers, we find the major errors (53.5\%) are due to missing document context. This drives us …

cs.cl cs.ir

