[D] Transformer multi-head attention implementation | allainews.com

Jan. 19, 2024, 2:48 a.m. | /u/Melodic_Stomach_2704

Machine Learning www.reddit.com

I've been following [Annotated Transformer](https://nlp.seas.harvard.edu/annotated-transformer/) to implement the transformer architecture. In the multi-head attention class's `forward()` method, Query, Key & Value are being multiplied with corresponding projection matrix; `W_q, W_k, W_v`.

query, key, value = [
lin(x).view(nbatches, -1, self.h, self.d_k).transpose(1, 2)
for lin, x in zip(self.linears, (query, key, value))
]

Here, `lin(x)` is being reshaped into `(nbatches, -1, self.h, self.d_k)`and dimension 1 & 2 is being transposed which makes the dimension `(nbatches, self.h, -1, self.d_k).`

I'm failing to understand why …

key machinelearning query value view zip

More from www.reddit.com / Machine Learning

[D] Where is https://ai.papers.bar/papers/weekly an hour ago | www.reddit.com

machinelearning project

[N] AI is promoted from back-office duties to investment decisions 8 hours ago | www.reddit.com

decisions investment machinelearning office +1

[P] Baysian bandits item pricing in a Moonlighter shop simulation 9 hours ago | www.reddit.com

agent bayesian customer game +8

[D] The Dilemma of Taking Notes on Every ML Resource or Accepting Knowledge Loss Over … 10 hours ago | www.reddit.com

every knowledge loss machine +7

[R] MetaEarth - A Generative Foundation Model for Global-Scale Remote Sensing Image Generation 10 hours ago | www.reddit.com

foundation foundation model generative global +5

If LLMs are token-based autoregressive models, how do they generate images? (Transformers + VQVAE) [D] 12 hours ago | www.reddit.com

autoregressive autoregressive models gemini generate +10

[Discussion] Are people interested in creating a mid-tier GPU rig using two RTX A6000's joined … 13 hours ago | www.reddit.com

costs grant grant program machinelearning +3

[Research] Tangles: a new mathematical ML tool in book announced by Diestel 13 hours ago | www.reddit.com

artificial artificial intelligence book cambridge +11

[R] Tech report on FineWeb: decanting the web for the finest text data at scale 18 hours ago | www.reddit.com

arc benchmarks crawl datasets +10

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

View on ai-jobs.net

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

@ HelloBetter | Remote

View on ai-jobs.net

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena

View on ai-jobs.net

Seeking Developers and Engineers for AI T-Shirt Generator Project

@ Chevon Hicks | Remote

View on ai-jobs.net

Senior Applied Data Scientist

@ dunnhumby | London

View on ai-jobs.net

Principal Data Architect - Azure & Big Data

@ MGM Resorts International | Home Office - US, NV

View on ai-jobs.net