all AI news
[R] Speaker diarization
April 24, 2024, 3:01 p.m. | /u/anuragrawall
Machine Learning www.reddit.com
I am working on a project where I want to create speaker-aware transcripts from audios/videos, preferably using open-source solutions. I have tried so many approaches but nothing seems to work good enough out of the box.
I have tried:
1. whisperX: [https://github.com/m-bain/whisperX](https://github.com/m-bain/whisperX) (uses pyannote)
2. whisper-diarization: [https://github.com/MahmoudAshraf97/whisper-diarization](https://github.com/MahmoudAshraf97/whisper-diarization) (uses Nemo)
3. AWS Transcribe
4. AssemblyAI API
5. Picovoice API
I'll need to dig deeper and understand what's causing the incorrect diarization but I am looking for suggestions to …
api assemblyai aws box create diarization good machinelearning nothing project solutions speaker transcribe transcripts videos work
More from www.reddit.com / Machine Learning
Jobs in AI, ML, Big Data
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US
Research Engineer
@ Allora Labs | Remote
Ecosystem Manager
@ Allora Labs | Remote
Founding AI Engineer, Agents
@ Occam AI | New York