Jan. 11, 2024, 3:37 p.m. | /u/vatsadev

Machine Learning www.reddit.com

I have a colab notebook with a super simple andrej karpahy GPT (https://colab.research.google.com/drive/17j0xI5n-wRK3c6BQagCEbw38EJ39M7G3?usp=sharing), and I wanted to try adding a ViT/Clip/Fuyu style embedding to it.

ViT/Clip, I would need the entire clip model, which is anywhere from 30x to 5x my transformer size, so its harder to pick Fuyu, from what I've found, runs image patches through an MLP, which is way smaller, but im not sure where the embeddings go

How do I replace tokens with embeddings?

clip embeddings found image machinelearning mlp through tokens transformer vit

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Lead Data Scientist, Commercial Analytics

@ Checkout.com | London, United Kingdom

Data Engineer I

@ Love's Travel Stops | Oklahoma City, OK, US, 73120