May 1, 2024, 1:43 p.m. | /u/Puzzleheaded_Bee5489

Deep Learning www.reddit.com

I'm working on a Speaker Verification project wherein I'm exploring different techniques to verify the speaker via voice. The traditional approach is to extract the [MFCC](https://medium.com/@derutycsl/intuitive-understanding-of-mfccs-836d36a1f779), Filterbanks, and prosodic features. Now this method seems to be outdated as most of the research is focused on making use of pre-trained models like [Nvidia's TitaNet](https://huggingface.co/nvidia/speakerverification_en_titanet_large), [Microsoft's WavLM](https://huggingface.co/docs/transformers/en/model_doc/wavlm), SpeechBrain also a model for this. Now these pre-trained models give **Embeddings** as output which represent the speaker's voice regardless of what he said in …

architecture deeplearning embeddings lstm pattern them

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US