all AI news
Encoder-Decoder Model
Nov. 15, 2023, 1:23 p.m. | /u/duffano
Deep Learning www.reddit.com
I had a look at the encoder-decoder architecture following the seminal paper "Attention is all you need".
After doing experiments on my own and doing further reading, I found many sources saying that the (maximum) input lengths of encoder and decoder are usually the same, or that there is no reason in practice to use different legnths (see e.g. [https://stats.stackexchange.com/questions/603535/in-transformers-for-the-maximum-length-of-encoders-input-sequences-and-decoder](https://stats.stackexchange.com/questions/603535/in-transformers-for-the-maximum-length-of-encoders-input-sequences-and-decoder)).
What puzzles me is the "usually". I want to understand the thing on the mathematical level, and I …
architecture attention attention is all you need decoder deeplearning encoder encoder-decoder look paper practice reason
More from www.reddit.com / Deep Learning
Classification of images with numerical "continous" categories
2 days, 16 hours ago |
www.reddit.com
How does gradient descent work in random forest
3 days, 8 hours ago |
www.reddit.com
Prerequisites for jumping into transformers?
3 days, 10 hours ago |
www.reddit.com
[Reading] Deeplearning by goodfellow
3 days, 16 hours ago |
www.reddit.com
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US