May 5, 2024, 3:41 a.m. | /u/Tensor_Devourer_56

Machine Learning www.reddit.com

Hi, as stated in the title, I'm curious if such methods exist. We know that (trained) CLIP's image and text encoders both output an 1D vector that are aligned in the latent space, which allows to easily compute the similarities between a batch of images and texts. However, in many vision applications, it is desirable to get a 3D feature map of shape C\*H\*W. Ideally, if the vector at each spatial location in this feature map is as high-quality as …

clip compute feature finetuning image machinelearning map quality scale space text vector

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US