March 22, 2024, 4:42 a.m. | Tim Salzmann, Markus Ryll, Alex Bewley, Matthias Minderer

cs.LG updates on arXiv.org arxiv.org

arXiv:2403.14270v1 Announce Type: cross
Abstract: Visual relationship detection aims to identify objects and their relationships in images. Prior methods approach this task by adding separate relationship modules or decoders to existing object detection architectures. This separation increases complexity and hinders end-to-end training, which limits performance. We propose a simple and highly efficient decoder-free architecture for open-vocabulary visual relationship detection. Our model consists of a Transformer-based image encoder that represents objects as tokens and models their relationships implicitly. To extract relationship …

abstract architectures arxiv complexity cs.cl cs.cv cs.lg cs.ro decoder detection free graph identify images modules object objects performance prior relationship relationships simple training type visual vit

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Research Scientist, Demography and Survey Science, University Grad

@ Meta | Menlo Park, CA | New York City

Computer Vision Engineer, XR

@ Meta | Burlingame, CA