all AI news
MTGA: Multi-view Temporal Granularity aligned Aggregation for Event-based Lip-reading
April 19, 2024, 4:44 a.m. | Wenhao Zhang, Jun Wang, Yong Luo, Lei Yu, Wei Yu, Zheng He
cs.CV updates on arXiv.org arxiv.org
Abstract: Lip-reading is to utilize the visual information of the speaker's lip movements to recognize words and sentences. Existing event-based lip-reading solutions integrate different frame rate branches to learn spatio-temporal features of varying granularities. However, aggregating events into event frames inevitably leads to the loss of fine-grained temporal information within frames. To remedy this drawback, we propose a novel framework termed Multi-view Temporal Granularity aligned Aggregation (MTGA). Specifically, we first present a novel event representation method, …
abstract aggregation arxiv cs.cv event events features fine-grained however information leads learn loss movements rate reading solutions speaker temporal type view visual words
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
#13721 - Data Engineer - AI Model Testing
@ Qualitest | Miami, Florida, United States
Elasticsearch Administrator
@ ManTech | 201BF - Customer Site, Chantilly, VA