March 2, 2024, 2:46 p.m. | /u/mono1110

Deep Learning www.reddit.com

I am trying to solve sentiment classification problem using self-attention mechanism. The architecture is simple. One self-attention head, one feedforward layer followed by an output layer.

Initial positional encoding was added. The model overfitted (will work to mitigate it). Then I got curious what would happen if I removed positional encoding.

The model still overfitted.

Any thoughts why?

Thanks.

architecture attention classification deeplearning encoder encoding head layer part positional encoding self-attention sentiment simple solve thoughts transformer will work

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne