Web: http://arxiv.org/abs/2205.02444

May 6, 2022, 1:11 a.m. | Rong Ye, Mingxuan Wang, Lei Li

cs.CL updates on arXiv.org arxiv.org

How can we learn unified representations for spoken utterances and their
written text? Learning similar representations for semantically similar speech
and text is important for speech translation. To this end, we propose ConST, a
cross-modal contrastive learning method for end-to-end speech-to-text
translation. We evaluate ConST and a variety of previous baselines on a popular
benchmark MuST-C. Experiments show that the proposed ConST consistently
outperforms the previous methods on, and achieves an average BLEU of 29.4. The
analysis further verifies that …

arxiv cross learning speech translation

Predictive Ecology Postdoctoral Fellow

@ Lawrence Berkeley National Lab | Berkeley, CA

Data Analyst, Patagonia Action Works

@ Patagonia | Remote

Data & Insights Strategy & Innovation General Manager

@ Chevron Services Company, a division of Chevron U.S.A Inc. | Houston, TX

Faculty members in Research areas such as Bayesian and Spatial Statistics; Data Privacy and Security; AI/ML; NLP; Image and Video Data Analysis

@ Ahmedabad University | Ahmedabad, India

Director, Applied Mathematics & Computational Research Division

@ Lawrence Berkeley National Lab | Berkeley, Ca

Business Data Analyst

@ MainStreet Family Care | Birmingham, AL