June 16, 2024, 3:18 a.m. | /u/ivanstepanovftw

Machine Learning www.reddit.com

It's counter-intuitive that most successful audio frameworks are using 2-dimensional convolutional neural networks (CNN), so I have tried to experiment while trying to train on [BirdCLEF-2024 on Kaggle](https://www.kaggle.com/competitions/birdclef-2024) using simple frameworks, and I have questions regarding learning:

1. When learning waveform input, why 1D CNN does not converge and even diverge immediately on validation split?
2. When training on spectrogram magnitude (stft -> abs -> log1p), why 1D CNN performs worse than 2D CNN?
3. While it seems that spectrogram …

animals cnn converge humans information input machinelearning raw spectrogram split training validation while

Senior Data Engineer

@ Displate | Warsaw

Content Designer

@ Glean | Palo Alto, CA

IT&D Data Solution Architect

@ Reckitt | Hyderabad, Telangana, IN, N/A

Python Developer

@ Riskinsight Consulting | Hyderabad, Telangana, India

Technical Lead (Java/Node.js)

@ LivePerson | Hyderabad, Telangana, India (Remote)

Backend Engineer - Senior and Mid-Level - Sydney Hybrid or AU remote

@ Displayr | Sydney, New South Wales, Australia