June 6, 2022, 10:34 a.m. | /u/ShujiMikami

Machine Learning www.reddit.com

The decoder in question comes from [this article](https://arxiv.org/pdf/2104.13332.pdf) and is responsible for converting a set of features obtained from a sequence of frames into an audio sample of 40ms, or 640 samples total.

https://preview.redd.it/6a702r0y5z391.png?width=901&format=png&auto=webp&s=67020e8e41fadaf1e22824430ab78d93556cdcc9

The problem arises when you begin doing calculations with the parameters presented by the authors. Even assuming that only stride would correspond to an increase in the length of output sequence, the total length would come out to \[batch, 1, 1280\] - twice the size presented …

convolution machinelearning making sense

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Analyst - Associate

@ JPMorgan Chase & Co. | Mumbai, Maharashtra, India

Staff Data Engineer (Data Platform)

@ Coupang | Seoul, South Korea

AI/ML Engineering Research Internship

@ Keysight Technologies | Santa Rosa, CA, United States

Sr. Director, Head of Data Management and Reporting Execution

@ Biogen | Cambridge, MA, United States

Manager, Marketing - Audience Intelligence (Senior Data Analyst)

@ Delivery Hero | Singapore, Singapore