How do AI researchers know create novel architectures? What do they know which I don't? | allainews.com

Feb. 11, 2024, 11:35 a.m. | /u/mono1110

Deep Learning www.reddit.com

For example take transformer architecture or attention mechanism. How did they know that by combining self attention with layer normalisation, positional encoding we can have models that will outperform lstm, CNNs?

I am asking this from the perspective of mathematics. Currently I feel like I can never come up with something new, and there is something missing which ai researchers know which I don't.

So what do I need to know that will allow me to solve problems in new …

ai researchers architecture architectures attention cnns deeplearning encoding example layer lstm mathematics novel perspective positional encoding researchers transformer transformer architecture will

More from www.reddit.com / Deep Learning

How LLMs are trained? A simple guide to understand LLM Training 23 hours ago | www.reddit.com

deeplearning guide llm llms +3

What is the efficient way of learning ML? 1 day ago | www.reddit.com

concepts course deeplearning python +3

Update v1.2 of the "Little Book of Deep Learning." Minor changes + a new chapter … 1 day, 1 hour ago | www.reddit.com

book deep learning deeplearning llms +4

Kolmogorov-Arnold Networks (KANs) Explained: A Superior Alternative to MLPs 1 day, 7 hours ago | www.reddit.com

Classification of images with numerical "continous" categories 2 days, 21 hours ago | www.reddit.com

age classification clear deeplearning +6

How can I truly learn to code the models, not just understand them? 3 days, 11 hours ago | www.reddit.com

architectures code coding concepts +9

How does gradient descent work in random forest 3 days, 12 hours ago | www.reddit.com

beast deeplearning gradient parameters +2

Prerequisites for jumping into transformers? 3 days, 15 hours ago | www.reddit.com

basics cnns concepts deep learning +11

[Reading] Deeplearning by goodfellow 3 days, 21 hours ago | www.reddit.com

alternative assessment bayesian change +9

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net