June 8, 2024

Deep Learning

Hi guys!

This is just a clarification post. As far as I understand, key (K), query (Q), and value (V) vectors come from the **same** **embeddings**. Let me explain: we project the same embeddings **into different weight matrices** (WK, WQ, and WV) and we operate with those. Am I getting this right?

Thank you!

