Feb. 26, 2024, 5:46 a.m. | Jeong Hun Yeo, Seunghee Han, Minsu Kim, Yong Man Ro

cs.CV updates on arXiv.org arxiv.org

arXiv:2402.15151v1 Announce Type: new
Abstract: In visual speech processing, context modeling capability is one of the most important requirements due to the ambiguous nature of lip movements. For example, homophenes, words that share identical lip movements but produce different sounds, can be distinguished by considering the context. In this paper, we propose a novel framework, namely Visual Speech Processing incorporated with LLMs (VSP-LLM), to maximize the context modeling ability by bringing the overwhelming power of LLMs. Specifically, VSP-LLM is designed …

abstract arxiv capability context cs.cl cs.cv eess.as eess.iv example framework language llm modeling movements nature processing requirements speech speech processing type visual words

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote