April 2, 2024, 7:47 p.m. | Rongjie Li, Songyang Zhang, Dahua Lin, Kai Chen, Xuming He

cs.CV updates on arXiv.org arxiv.org

arXiv:2404.00906v1 Announce Type: new
Abstract: Scene graph generation (SGG) aims to parse a visual scene into an intermediate graph representation for downstream reasoning tasks. Despite recent advancements, existing methods struggle to generate scene graphs with novel visual relation concepts. To address this challenge, we introduce a new open-vocabulary SGG framework based on sequence generation. Our framework leverages vision-language pre-trained models (VLM) by incorporating an image-to-graph generation paradigm. Specifically, we generate scene graph sequences via image-to-text generation with VLM and then …

abstract arxiv challenge concepts cs.cv generate graph graph representation graphs intermediate language language models novel pixels reasoning representation struggle tasks type vision vision-language models visual

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

Data Analyst, Client Insights and Analytics - New Graduate, Full Time

@ Scotiabank | Toronto, ON, CA

Consultant Senior Data Scientist (H/F)

@ Publicis Groupe | Paris, France

Data Analyst H/F - CDI

@ Octapharma | Lingolsheim, FR

Lead AI Engineer

@ Ford Motor Company | United States

Senior Staff Machine Learning Engineer

@ Warner Bros. Discovery | CA San Francisco 153 Kearny Street