Jan. 14, 2022, 2:10 a.m. | Noveen Sachdeva, Carole-Jean Wu, Julian McAuley

cs.LG updates on arXiv.org arxiv.org

We study the practical consequences of dataset sampling strategies on the
ranking performance of recommendation algorithms. Recommender systems are
generally trained and evaluated on samples of larger datasets. Samples are
often taken in a naive or ad-hoc fashion: e.g. by sampling a dataset randomly
or by selecting users or items with many interactions. As we demonstrate,
commonly-used data sampling schemes can have significant consequences on
algorithm performance. Following this observation, this paper makes three main
contributions: (1) characterizing the effect …

arxiv collaborative collaborative filtering datasets

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Staff Software Engineer, Generative AI, Google Cloud AI

@ Google | Mountain View, CA, USA; Sunnyvale, CA, USA

Expert Data Sciences

@ Gainwell Technologies | Any city, CO, US, 99999