all AI news
[D] How does multi-head attention actually work?
July 29, 2022, 2:58 p.m. | /u/jwngx
Machine Learning www.reddit.com
The [Illustrated Transformer](https://jalammar.github.io/illustrated-transformer/) shows eight sets of weight matrices being used for eight heads. But other implementations I've seen (The [Annotated Transformer](http://nlp.seas.harvard.edu/annotated-transformer/#full-model) and Gordic Aleksa's [implementation](https://github.com/gordicaleksa/pytorch-original-transformer/blob/main/models/definitions/transformer_model.py), as well as his video on his popular channel The AI Epiphany) seem to …
More from www.reddit.com / Machine Learning
Jobs in AI, ML, Big Data
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Senior Software Engineer, Generative AI (C++)
@ SoundHound Inc. | Toronto, Canada