Feb. 11, 2024, 11:35 a.m. | /u/mono1110

Deep Learning www.reddit.com

For example take transformer architecture or attention mechanism. How did they know that by combining self attention with layer normalisation, positional encoding we can have models that will outperform lstm, CNNs?

I am asking this from the perspective of mathematics. Currently I feel like I can never come up with something new, and there is something missing which ai researchers know which I don't.

So what do I need to know that will allow me to solve problems in new …

ai researchers architecture architectures attention cnns deeplearning encoding example layer lstm mathematics novel perspective positional encoding researchers transformer transformer architecture will

Research Scholar (Technical Research)

@ Centre for the Governance of AI | Hybrid; Oxford, UK

HPC Engineer (x/f/m) - DACH

@ Meshcapade GmbH | Remote, Germany

Senior Analyst-Data Analysis

@ Tesco Bengaluru | Bengaluru, India

Data Engineer - Senior Associate

@ PwC | Brussels

People Data Analyst

@ Version 1 | London, United Kingdom

Senior Data Scientist

@ Palta | Simple Cyprus or remote