Oct. 2, 2022, 8:56 p.m. | /u/029187

Machine Learning www.reddit.com

So an attention layer has a Q, K, and V vector My understanding is the goal is to say for a given query q, how relevant is the value v.

From this the network learns which data is relevant to focus on for a given input.

But what I don't get is why this is effective. Don't DNNs already do this with weights? A neuron in a hidden layer can be set off by any arbitrary combination of inputs, so …

attention low machinelearning network work

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Lead Software Engineer - Artificial Intelligence, LLM

@ OpenText | Hyderabad, TG, IN

Lead Software Engineer- Python Data Engineer

@ JPMorgan Chase & Co. | GLASGOW, LANARKSHIRE, United Kingdom

Data Analyst (m/w/d)

@ Collaboration Betters The World | Berlin, Germany

Data Engineer, Quality Assurance

@ Informa Group Plc. | Boulder, CO, United States

Director, Data Science - Marketing

@ Dropbox | Remote - Canada