all AI news
How does "clip-vit-large-patch14" aggregate the text sequence representation into a singular vector that represent the entire sequence? There is no [CLS] token, but [SOT] and [EOT] tokens. [Research]
April 8, 2024, 1:09 p.m. | /u/tommilyjonesOG
Machine Learning www.reddit.com
I have the following question:
How does "clip-vit-large-patch14" aggregate the text sequence representation into a singular vector that represents the entire sequence? There is no \[CLS\] token, but \[SOT\] and \[EOT\] tokens.
When I use the CLIP Text Encoder and extract the pooler\_output.. how exactly is this vector created? Is the \[SOT\] token used as \[CLS\] token? Or does a pooling operation take place?
\[Research\]
Best regards,
Tom
clip hello machinelearning question representation research singular text token tokens vector vit
More from www.reddit.com / Machine Learning
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Risk Management - Machine Learning and Model Delivery Services, Product Associate - Senior Associate-
@ JPMorgan Chase & Co. | Wilmington, DE, United States
Senior ML Engineer (Speech/ASR)
@ ObserveAI | Bengaluru