June 27, 2022, 12:56 a.m. | /u/WigglyHypersurface

Machine Learning www.reddit.com

I've been looking into using the Perciever for a project that involves single-channel (mono) audio. From the existing implementations and tutorials, I can't find one that only does audio. It seems like in the papers they rearrange the audio into patches and add position encodings, but this is a hack to bring the audio modality into the same size tensor as other modalities. If only using 1d audio is there any need at all for position encodings at all?

audio io machinelearning

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Strategy & Management - Private Equity Sector - Manager - Consulting - Location OPEN

@ EY | New York City, US, 10001-8604

Data Engineer- People Analytics

@ Volvo Group | Gothenburg, SE, 40531