Encoder part of transformer learns even after removing positional encoding. Thoughts? | allainews.com

March 2, 2024, 2:46 p.m. | /u/mono1110

Deep Learning www.reddit.com

I am trying to solve sentiment classification problem using self-attention mechanism. The architecture is simple. One self-attention head, one feedforward layer followed by an output layer.

Initial positional encoding was added. The model overfitted (will work to mitigate it). Then I got curious what would happen if I removed positional encoding.

The model still overfitted.

Any thoughts why?

Thanks.

architecture attention classification deeplearning encoder encoding head layer part positional encoding self-attention sentiment simple solve thoughts transformer will work

More from www.reddit.com / Deep Learning

The Vibe I get from the KAN paper 11 hours ago | www.reddit.com

cases deeplearning fun grid +5

Kolmogorov-Arnold Networks (KANs): A Promising Alternative for Better Accuracy and Interpretability in Deep Learning 1 day, 11 hours ago | www.reddit.com

accuracy alternative deep learning deeplearning +2

State of the Art Transfer Anything is IDM-VTON - based on Stable Diffusion 1 day, 12 hours ago | www.reddit.com

art deeplearning diffusion stable diffusion +3

What's your opinions about KAN? 1 day, 15 hours ago | www.reddit.com

deeplearning opinions

I have a hard time understanding the maths behind AI but I also want to … 2 days, 6 hours ago | www.reddit.com

deeplearning maths suggestions understanding

if i was a freelancer deep learning engineer and i work with a company will … 2 days, 19 hours ago | www.reddit.com

computer deep learning deeplearning engineer +6

What does Speaker Embeddings consists of? 2 days, 23 hours ago | www.reddit.com

architecture deeplearning embeddings lstm +2

Physics-Based Deep Learning: Insights into Physics-Informed Neural Networks (PINNs) 3 days, 16 hours ago | www.reddit.com

deep learning deeplearning insights networks +3

How would one write the following loss function in python? I am currently stuck on … 4 days, 5 hours ago | www.reddit.com

deeplearning function loss python

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net