[D]In transformer models, why is there a query and key matrix instead of just the product? | allainews.com

Nov. 26, 2023, 9:59 p.m. | /u/lildaemon

Machine Learning www.reddit.com

The only time that the query and key matrices are used is to compute the attention scores. That is $v\_i\^T \\cdot W\_q\^T W\_k v\_j$ But what is used is the matrix $W\_q\^T W\_k$. Why not just replace $W\_q\^T W\_k$ with a single matrix $W\_{qv}$, and learn the matrix that is the product of W\_q\^T W\_k instead of the matrices themselves? How does it help to have two matrices instead of one? And if it helps, why is that not done …

attention compute machinelearning matrix product query the matrix transformer transformer models

More from www.reddit.com / Machine Learning

A Multi-Agent game where LLMs must trick each other as humans until one gets caught … 8 hours ago | www.reddit.com

agent fun game humans +7

[D] How reliable is RAG currently? 8 hours ago | www.reddit.com

context context window documents machinelearning +5

[N] New Challenges in DIAMBRA Arena: 3 epic additions to our lineup of RL environments! 8 hours ago | www.reddit.com

arena challenges environments epic +1

[R] An Analysis of Linear Time Series Forecasting Models 11 hours ago | www.reddit.com

abstract analysis forecasting form +9

[D] The "it" in AI models is really just the dataset? 11 hours ago | www.reddit.com

ai models dataset machinelearning

[D] Analysis of Time To First Token (TTFT) of LLMs (10B-34B) 13 hours ago | www.reddit.com

analysis containers docker hey +10

[P] Open Source / Projects Based Machine Learning Community? 17 hours ago | www.reddit.com

building collaborations community devs +16

[R] DDPM for Timeseries Generation 18 hours ago | www.reddit.com

column data data generation dataset +13

[P] [D] Examples of client projects that you have delivered 19 hours ago | www.reddit.com

client consulting examples freelance +6

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net