May 15, 2024, 1:02 a.m. | /u/Jesse_marqo

Machine Learning www.reddit.com

MRL \[1\] for CLIP allows smaller dimension embeddings to be used without loss in fidelity. Training is modified to optimize for truncated embeddings (multiple target dimensions at once) across both vision and text encoders.


Key findings:

* Reducing embeddings size by 4x retains \~95 performance
* Projection layers for sub-embeddings did not help performance
* Works in and out (zero-shot) of domain on multi-modal retrieval
* Using too many sub-embeddings degrades performance (i.e. {512, 256, 128} vs {512, 256, 128, …

clip dimensions embeddings fidelity key loss machinelearning multiple performance projection representation representation learning text training vision

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US