JEP-KD: Joint-Embedding Predictive Architecture Based Knowledge Distillation for Visual Speech Recognition | allainews.com

March 29, 2024, 4:42 a.m. | Chang Sun, Hong Yang, Bo Qin

cs.LG updates on arXiv.org arxiv.org

arXiv:2403.18843v1 Announce Type: cross
Abstract: Visual Speech Recognition (VSR) tasks are generally recognized to have a lower theoretical performance ceiling than Automatic Speech Recognition (ASR), owing to the inherent limitations of conveying semantic information visually. To mitigate this challenge, this paper introduces an advanced knowledge distillation approach using a Joint-Embedding Predictive Architecture (JEPA), named JEP-KD, designed to more effectively utilize audio features during model training. Central to JEP-KD is the inclusion of a generative network within the embedding layer, which …

abstract advanced architecture arxiv asr automatic speech recognition challenge cs.cl cs.cv cs.lg cs.sd distillation eess.as embedding information knowledge limitations paper performance predictive recognition semantic speech speech recognition tasks type visual

More from arxiv.org / cs.LG updates on arXiv.org

Course Recommender Systems Need to Consider the Job Market 15 hours ago | arxiv.org

abstract arxiv course cs.ir +16

$\texttt{immrax}$: A Parallelizable and Differentiable Toolbox for Interval Analysis and Mixed Monotone Reachability in JAX 15 hours ago | arxiv.org

abstract analysis arxiv compilation +18

Thousands of AI Authors on the Future of AI 15 hours ago | arxiv.org

abstract advanced advanced ai ai progress +21

Graphene: Infrastructure Security Posture Analysis with AI-generated Attack Graphs 15 hours ago | arxiv.org

abstract analysis arxiv assessment +24

Volume-Preserving Transformers for Learning Time Series Data with Structure 15 hours ago | arxiv.org

abstract arxiv cs.lg cs.na +24

Eureka: Human-Level Reward Design via Coding Large Language Models 15 hours ago | arxiv.org

abstract algorithm arxiv bridge +25

Reconstruction of Unstable Heavy Particles Using Deep Symmetry-Preserving Attention Networks 15 hours ago | arxiv.org

abstract arxiv attention cs.lg +11

FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search 15 hours ago | arxiv.org

abstract arxiv become compression +24

Gaussian random field approximation via Stein's method with applications to wide random neural networks 15 hours ago | arxiv.org

abstract applications approximation arxiv +14

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

#13721 - Data Engineer - AI Model Testing

@ Qualitest | Miami, Florida, United States

View on ai-jobs.net

Elasticsearch Administrator

@ ManTech | 201BF - Customer Site, Chantilly, VA

View on ai-jobs.net