Jan. 23, 2024, 2:37 p.m. | /u/Sufficient-Tennis189

Machine Learning www.reddit.com

A month ago, Meta AI released W2V-Bert, one of the building blocks of their Seamless models. 

It's been pretrained on 4.5M hours of unlabeled audio data, covering more than 143 languages.


Pros:

* Enables low-resource fine-tuning
* Faster and lighter than Whisper
* MIT-license
* Can be fine-tuned for other audio tasks

Cons:

* CTC-based so it's for normalized transcriptions
* Need to be fine-tuned before used


Resources:

* Original repository: [https://github.com/facebookresearch/seamless\_communication?tab=readme-ov-file#whats-new](https://github.com/facebookresearch/seamless_communication?tab=readme-ov-file#whats-new)
* Transformers docs: [https://huggingface.co/docs/transformers/main/en/model\_doc/wav2vec2-bert](https://huggingface.co/docs/transformers/main/en/model_doc/wav2vec2-bert)
* ASR fine-tuning on …

audio bert building data faster fine-tuning languages license low machinelearning meta meta ai mit pros wav2vec2 whisper

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Data Engineer - Takealot Group (Takealot.com | Superbalist.com | Mr D Food)

@ takealot.com | Cape Town