June 10, 2024, 4:48 a.m. | Feiyu Pan, Hao Fang, Xiankai Lu

cs.CV updates on arXiv.org arxiv.org

arXiv:2406.04842v1 Announce Type: new
Abstract: Referring video object segmentation (RVOS) relies on natural language expressions to segment target objects in video, emphasizing modeling dense text-video relations. The current RVOS methods typically use independently pre-trained vision and language models as backbones, resulting in a significant domain gap between video and text. In cross-modal feature interaction, text features are only used as query initialization and do not fully utilize important information in the text. In this work, we propose using frozen pre-trained …

abstract arxiv cs.cv current cvpr language language models modeling natural natural language object objects relations segment segmentation solution text type video vision workshop

Senior Data Engineer

@ Displate | Warsaw

Principal Software Engineer

@ Microsoft | Prague, Prague, Czech Republic

Sr. Global Reg. Affairs Manager

@ BASF | Research Triangle Park, NC, US, 27709-3528

Senior Robot Software Developer

@ OTTO Motors by Rockwell Automation | Kitchener, Ontario, Canada

Coop - Technical Service Hub Intern

@ Teradyne | Santiago de Queretaro, MX

Coop - Technical - Service Inside Sales Intern

@ Teradyne | Santiago de Queretaro, MX