all AI news
[P] In most Multimodal LLMs, where are the image embeddings given to the model?
Jan. 11, 2024, 3:37 p.m. | /u/vatsadev
Machine Learning www.reddit.com
ViT/Clip, I would need the entire clip model, which is anywhere from 30x to 5x my transformer size, so its harder to pick Fuyu, from what I've found, runs image patches through an MLP, which is way smaller, but im not sure where the embeddings go
How do I replace tokens with embeddings?
clip embeddings found image machinelearning mlp through tokens transformer vit
More from www.reddit.com / Machine Learning
Jobs in AI, ML, Big Data
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Lead Data Scientist, Commercial Analytics
@ Checkout.com | London, United Kingdom
Data Engineer I
@ Love's Travel Stops | Oklahoma City, OK, US, 73120