all AI news
MTGA: Multi-view Temporal Granularity aligned Aggregation for Event-based Lip-reading
April 19, 2024, 4:44 a.m. | Wenhao Zhang, Jun Wang, Yong Luo, Lei Yu, Wei Yu, Zheng He
cs.CV updates on arXiv.org arxiv.org
Abstract: Lip-reading is to utilize the visual information of the speaker's lip movements to recognize words and sentences. Existing event-based lip-reading solutions integrate different frame rate branches to learn spatio-temporal features of varying granularities. However, aggregating events into event frames inevitably leads to the loss of fine-grained temporal information within frames. To remedy this drawback, we propose a novel framework termed Multi-view Temporal Granularity aligned Aggregation (MTGA). Specifically, we first present a novel event representation method, …
abstract aggregation arxiv cs.cv event events features fine-grained however information leads learn loss movements rate reading solutions speaker temporal type view visual words
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US
Research Engineer
@ Allora Labs | Remote
Ecosystem Manager
@ Allora Labs | Remote
Founding AI Engineer, Agents
@ Occam AI | New York