Feb. 13, 2024, 5:44 a.m. | Qian Yang Jin Xu Wenrui Liu Yunfei Chu Ziyue Jiang Xiaohuan Zhou Yichong Leng Yuanjun Lv

cs.LG updates on arXiv.org arxiv.org

Recently, instruction-following audio-language models have received broad attention for human-audio interaction. However, the absence of benchmarks capable of evaluating audio-centric interaction capabilities has impeded advancements in this field. Previous models primarily focus on assessing different fundamental tasks, such as Automatic Speech Recognition (ASR), and lack an assessment of the open-ended generative capabilities centered around audio. Thus, it is challenging to track the progression in the Large Audio-Language Models (LALMs) domain and to provide guidance for future improvement. In this paper, …

asr assessment attention audio automatic speech recognition benchmarking benchmarks capabilities cs.cl cs.lg cs.sd eess.as focus generative human language language models recognition speech speech recognition tasks via

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

#13721 - Data Engineer - AI Model Testing

@ Qualitest | Miami, Florida, United States

Elasticsearch Administrator

@ ManTech | 201BF - Customer Site, Chantilly, VA