[R] Looking for paper with shared Value-Query attention weights. | allainews.com

March 11, 2024, 7:56 p.m. | /u/benthehuman_

Machine Learning www.reddit.com

I could have sworn I skimmed a paper around a year ago which demonstrated pretty solid performance in transformers where the Value and Key (or Query) weights were the same / shared within each attention layer. I think Linformer does something similar, but I’m not looking for something that tries to solve the quadratic runtime of attention, just something that shows you can reasonable results with shared value and keys. It might’ve even been mentioned in this subreddit. Somehow I …

attention key layer machinelearning paper performance query solid something think transformers value

More from www.reddit.com / Machine Learning

[P] Open source library to scrape PDFs, YouTube, URLs, Presentations, etc for API-hosted vision-language models 5 hours ago | www.reddit.com

fun machinelearning

[P] LoRA from scratch implementation for LLM classifier training 9 hours ago | www.reddit.com

classifier implementation llm lora +3

[D] Dealing with conflicting training configurations in reference works. 9 hours ago | www.reddit.com

active learning compute detection machinelearning +7

[R] Marcus Hutter's work on Universal Artificial Intelligence 15 hours ago | www.reddit.com

artificial artificial intelligence bayesian biography +11

[P] LLMinator: A Llama.cpp + Gradio based opensource Chatbot to run llms locally(cpu/cuda) directly from … 16 hours ago | www.reddit.com

chatbot community context cpp +13

[D] Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow 2nd Edition 17 hours ago | www.reddit.com

book keras learn machine +7

[D] How to train very shallow (dot product) networks with huge embeddings on a GPU … 18 hours ago | www.reddit.com

cluster compute cpu embedding +11

[P] Google Colab crashes before even training my images dataset. 1 day, 6 hours ago | www.reddit.com

binary class classification colab +16

[D] Is Evaluating LLM Performance on Domain-Specific QA Sufficient for a Top-Tier Conference Submission? 1 day, 8 hours ago | www.reddit.com

conference domain five hello +9

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net