Sept. 7, 2023, 8:02 p.m. | Nieves Crasto

Towards AI - Medium pub.towardsai.net

Photo by Devin Avery on Unsplash

In this article, we will take a deep dive into the concept of attention in Transformer networks, particularly from the encoder’s perspective. We will cover the following topics:

  • What is machine translation?
  • Need for attention.
  • How is attention computed using Recurrent Neural Networks (RNNs)?
  • What is self-attention, and how is it computed using the Transformer’s encoder?
  • Multi-headed attention in the Encoder.

Machine Translation

We will look at Neural machine translation (NMT) as a running …

attention multi-head attention nlp self-attention transformers

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Software Engineer, Generative AI (C++)

@ SoundHound Inc. | Toronto, Canada