March 27, 2024, 4:46 a.m. | Jeong-Yoon Kim, Seung-Ho Lee

cs.CV updates on arXiv.org arxiv.org

arXiv:2403.17327v1 Announce Type: cross
Abstract: In this paper, we propose a method to improve the accuracy of speech emotion recognition (SER) by using vision transformer (ViT) to attend to the correlation of frequency (y-axis) with time (x-axis) in spectrogram and transferring positional information between ViT through knowledge transfer. The proposed method has the following originality i) We use vertically segmented patches of log-Mel spectrogram to analyze the correlation of frequencies over time. This type of patch allows us to correlate …

abstract accuracy arxiv correlation cs.cv cs.sd eess.as emotion information knowledge paper recognition spectrogram speech speech emotion temporal through transfer transformer type vision vit

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Field Sample Specialist (Air Sampling) - Eurofins Environment Testing – Pueblo, CO

@ Eurofins | Pueblo, CO, United States

Camera Perception Engineer

@ Meta | Sunnyvale, CA