Training attention QKV matrices [D] | allainews.com

March 10, 2024, 6:10 p.m. | /u/datashri

Machine Learning www.reddit.com

I'm starting to get a hang.of the attention paper and the significance of Q, K and V and of dot product attention and multi head attention.

What i don't understand is how the values of Q, K, and V matrices are trained. I've read the cross validation stack exchange answers and some others. I get that QKV come from the previous layer, but how does the previous layer determine/train their values?

Some form of backprop, sure. But what's the goal …

attention head machinelearning paper product significance stack stack exchange training validation values

More from www.reddit.com / Machine Learning

A Multi-Agent game where LLMs must trick each other as humans until one gets caught … 6 hours ago | www.reddit.com

agent fun game humans +7

[D] How reliable is RAG currently? 6 hours ago | www.reddit.com

context context window documents machinelearning +5

[N] New Challenges in DIAMBRA Arena: 3 epic additions to our lineup of RL environments! 7 hours ago | www.reddit.com

arena challenges environments epic +1

[R] An Analysis of Linear Time Series Forecasting Models 9 hours ago | www.reddit.com

abstract analysis forecasting form +9

[D] The "it" in AI models is really just the dataset? 9 hours ago | www.reddit.com

ai models dataset machinelearning

[D] Analysis of Time To First Token (TTFT) of LLMs (10B-34B) 12 hours ago | www.reddit.com

analysis containers docker hey +10

[P] Open Source / Projects Based Machine Learning Community? 15 hours ago | www.reddit.com

building collaborations community devs +16

[R] DDPM for Timeseries Generation 16 hours ago | www.reddit.com

column data data generation dataset +13

[P] [D] Examples of client projects that you have delivered 17 hours ago | www.reddit.com

client consulting examples freelance +6

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net