all AI news
From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models
April 2, 2024, 7:47 p.m. | Rongjie Li, Songyang Zhang, Dahua Lin, Kai Chen, Xuming He
cs.CV updates on arXiv.org arxiv.org
Abstract: Scene graph generation (SGG) aims to parse a visual scene into an intermediate graph representation for downstream reasoning tasks. Despite recent advancements, existing methods struggle to generate scene graphs with novel visual relation concepts. To address this challenge, we introduce a new open-vocabulary SGG framework based on sequence generation. Our framework leverages vision-language pre-trained models (VLM) by incorporating an image-to-graph generation paradigm. Specifically, we generate scene graph sequences via image-to-text generation with VLM and then …
abstract arxiv challenge concepts cs.cv generate graph graph representation graphs intermediate language language models novel pixels reasoning representation struggle tasks type vision vision-language models visual
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Data Engineer - Takealot Group (Takealot.com | Superbalist.com | Mr D Food)
@ takealot.com | Cape Town