April 14, 2024, 1:23 p.m. | /u/Puzzleheaded_Bee5489

Machine Learning www.reddit.com

I'm working on a user authentication project using **voice** i.e, **voice authentication.** I was researching on what are the different aspects of a given audio/speech which I can make use of to identify a particular person, one of the most commonly used things are the [MFCC](https://www.kaggle.com/code/ilyamich/mfcc-implementation-and-tutorial) features, which are extracted using any standard audio processing library like Librosa.

Now, in recent times we have Embeddings which essential capture the information in the form of vectors, it could be audio, video, …

audio embeddings features form image information machinelearning mean text the information the way vectors video

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne