Jan. 19, 2024, 2:48 a.m. | /u/Melodic_Stomach_2704

Machine Learning www.reddit.com

I've been following [Annotated Transformer](https://nlp.seas.harvard.edu/annotated-transformer/) to implement the transformer architecture. In the multi-head attention class's `forward()` method, Query, Key & Value are being multiplied with corresponding projection matrix; `W_q, W_k, W_v`.

query, key, value = [
lin(x).view(nbatches, -1, self.h, self.d_k).transpose(1, 2)
for lin, x in zip(self.linears, (query, key, value))
]

Here, `lin(x)` is being reshaped into `(nbatches, -1, self.h, self.d_k)`and dimension 1 & 2 is being transposed which makes the dimension `(nbatches, self.h, -1, self.d_k).`

I'm failing to understand why …

key machinelearning query value view zip

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

@ HelloBetter | Remote

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena

Seeking Developers and Engineers for AI T-Shirt Generator Project

@ Chevon Hicks | Remote

Senior Applied Data Scientist

@ dunnhumby | London

Principal Data Architect - Azure & Big Data

@ MGM Resorts International | Home Office - US, NV