Jan. 18, 2022, 4:24 p.m. | /u/Yagna24

Machine Learning www.reddit.com

I'm working on using an audio wav file as input that would have 2 to 3 people conversing.

  1. I have to identify the two different voices. Separate 2 voices.

  2. Convert these voices into text and have their transcripts stored.

I have gone through huggingface transformers such a web2vec2 and svoice from Facebook research but I find It difficult to implement the models.

Can someone guide me on approaching such tasks as I am being a beginner in audio domain of …

algorithm machinelearning speech

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Staff Software Engineer, Generative AI, Google Cloud AI

@ Google | Mountain View, CA, USA; Sunnyvale, CA, USA

Expert Data Sciences

@ Gainwell Technologies | Any city, CO, US, 99999