Feb. 11, 2024, 11:35 a.m. | /u/mono1110

Deep Learning www.reddit.com

For example take transformer architecture or attention mechanism. How did they know that by combining self attention with layer normalisation, positional encoding we can have models that will outperform lstm, CNNs?

I am asking this from the perspective of mathematics. Currently I feel like I can never come up with something new, and there is something missing which ai researchers know which I don't.

So what do I need to know that will allow me to solve problems in new …

ai researchers architecture architectures attention cnns deeplearning encoding example layer lstm mathematics novel perspective positional encoding researchers transformer transformer architecture will

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

RL Analytics - Content, Data Science Manager

@ Meta | Burlingame, CA

Research Engineer

@ BASF | Houston, TX, US, 77079