March 27, 2024, 4:48 a.m. | Daniel Saggau, Mina Rezaei, Bernd Bischl, Ilias Chalkidis

cs.CL updates on arXiv.org arxiv.org

arXiv:2305.16031v2 Announce Type: replace
Abstract: Learning quality document embeddings is a fundamental problem in natural language processing (NLP), information retrieval (IR), recommendation systems, and search engines. Despite recent advances in the development of transformer-based models that produce sentence embeddings with self-contrastive learning, the encoding of long documents (Ks of words) is still challenging with respect to both efficiency and quality considerations. Therefore, we train Longfomer-based document encoders using a state-of-the-art unsupervised contrastive learning method (SimCSE). Further on, we complement the …

abstract advances arxiv cs.cl development divergence document documents embeddings encoding information language language processing natural natural language natural language processing nlp processing quality recommendation recommendation systems retrieval search systems transformer type via words

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Research Scientist

@ Meta | Menlo Park, CA

Principal Data Scientist

@ Mastercard | O'Fallon, Missouri (Main Campus)