April 30, 2023, 10:30 a.m. | /u/supreethrao

Machine Learning www.reddit.com

Hello,

I've wanted to add flash attention to models on huggingface (particularly the LLaMA variants) is there a guide/playbook on going about adding different attention mechanisms to existing models? In the grander scheme of this I would like to build this out as a library where you pass in a model and it gives out the model with a different attention mechanism. Would this be of use since PyTorch 2.0 already supports flash attention.

​

Thanks !

attention attention mechanisms guide huggingface library llama machinelearning variants

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Software Engineer, Generative AI (C++)

@ SoundHound Inc. | Toronto, Canada