March 11, 2024, 7:56 p.m. | /u/benthehuman_

Machine Learning www.reddit.com

I could have sworn I skimmed a paper around a year ago which demonstrated pretty solid performance in transformers where the Value and Key (or Query) weights were the same / shared within each attention layer. I think Linformer does something similar, but I’m not looking for something that tries to solve the quadratic runtime of attention, just something that shows you can reasonable results with shared value and keys. It might’ve even been mentioned in this subreddit. Somehow I …

attention key layer machinelearning paper performance query solid something think transformers value

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York