all AI news
CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos. (arXiv:2204.00716v2 [cs.IR] UPDATED)
April 27, 2022, 1:11 a.m. | Shengyao Zhuang, Guido Zuccon
cs.CL updates on arXiv.org arxiv.org
Current dense retrievers are not robust to out-of-domain and outlier queries,
i.e. their effectiveness on these queries is much poorer than what one would
expect. In this paper, we consider a specific instance of such queries: queries
that contain typos. We show that a small character level perturbation in
queries (as caused by typos) highly impacts the effectiveness of dense
retrievers. We then demonstrate that the root cause of this resides in the
input tokenization strategy employed by BERT. In …
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Senior ML Researcher - 3D Geometry Processing | 3D Shape Generation | 3D Mesh Data
@ Promaton | Europe
Senior Manager, IT Ops & Service Management, AI/ML
@ Sephora | San Francisco, CA, US, 50302863
AI/ML Senior Software Engineer (Indonesia)
@ Bjak | Jakarta, Jakarta, Indonesia
Data Engineer
@ Accenture Federal Services | Laurel, MD
Principal Engineer, Deep Learning
@ Outrider | Montreal, Quebec
Consultant Data manager F/H
@ Atos | Bezons, FRANCE, FR, 95870