April 23, 2024, 11:07 p.m. | /u/Jesse_marqo

Machine Learning www.reddit.com

Generalization of the popular training method of CLIP to be better suited for search and recommendations.

Paper: https://arxiv.org/pdf/2404.08535.pdf

Github: https://github.com/marqo-ai/GCL

Generalises CLIP:

* Use any number of text and/or images to represent documents.
* Better text understanding by having both inter- and intra-modal losses.
* Can encode rank/importance/relevance, a.k.a “rank-tune”.
* Works with pretrained, text, CLIP models.
* Can learn uni- or multi-vector representations for documents.
* Works with binary and Matryoshka methods.
* Open source 10M row multi-modal dataset …

clip documents encode generalized images importance losses machinelearning modal multi-modal popular ranking recommendations retrieval search text text understanding training understanding

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Codec Avatars Research Engineer

@ Meta | Pittsburgh, PA