March 5, 2024, 2:49 p.m. | Chen Cai, Suchen Wang, Kim-hui Yap, Yi Wang

cs.CV updates on arXiv.org arxiv.org

arXiv:2306.07490v3 Announce Type: replace
Abstract: Weakly-supervised grounded image captioning (WSGIC) aims to generate the caption and ground (localize) predicted object words in the input image without using bounding box supervision. Recent two-stage solutions mostly apply a bottom-up pipeline: (1) encode the input image into multiple region features using an object detector; (2) leverage region features for captioning and grounding. However, utilizing independent proposals produced by object detectors tends to make the subsequent grounded captioner overfitted in finding the correct object …

abstract apply arxiv box captioning cs.cv encode features framework generate image multiple pipeline solutions stage supervision type weakly-supervised words

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Business Data Scientist, gTech Ads

@ Google | Mexico City, CDMX, Mexico

Lead, Data Analytics Operations

@ Zocdoc | Pune, Maharashtra, India