Top-Down Framework for Weakly-supervised Grounded Image Captioning | allainews.com

March 5, 2024, 2:49 p.m. | Chen Cai, Suchen Wang, Kim-hui Yap, Yi Wang

cs.CV updates on arXiv.org arxiv.org

arXiv:2306.07490v3 Announce Type: replace
Abstract: Weakly-supervised grounded image captioning (WSGIC) aims to generate the caption and ground (localize) predicted object words in the input image without using bounding box supervision. Recent two-stage solutions mostly apply a bottom-up pipeline: (1) encode the input image into multiple region features using an object detector; (2) leverage region features for captioning and grounding. However, utilizing independent proposals produced by object detectors tends to make the subsequent grounded captioner overfitted in finding the correct object …

abstract apply arxiv box captioning cs.cv encode features framework generate image multiple pipeline solutions stage supervision type weakly-supervised words

More from arxiv.org / cs.CV updates on arXiv.org

A survey on deep learning in medical image registration: new technologies, uncertainty, evaluation metrics, and … 8 hours ago | arxiv.org

abstract arxiv beyond cs.cv +16

Enhancing Super-Resolution Networks through Realistic Thick-Slice CT Simulation 8 hours ago | arxiv.org

abstract acquisition arxiv cs.ai +20

TransRUPNet for Improved Polyp Segmentation 8 hours ago | arxiv.org

arxiv cs.cv eess.iv segmentation +1

An interpretable machine learning system for colorectal cancer diagnosis from pathology slides 8 hours ago | arxiv.org

abstract artificial artificial intelligence arxiv +19

Attention is All They Need: Exploring the Media Archaeology of the Computer Vision Research Paper 8 hours ago | arxiv.org

abstract archaeology arxiv attention +22

Refining Remote Photoplethysmography Architectures using CKA and Empirical Methods 8 hours ago | arxiv.org

abstract architecture architectures arxiv +8

Learning to Complement with Multiple Humans 8 hours ago | arxiv.org

abstract adoption arxiv assumptions +12

HiH: A Multi-modal Hierarchy in Hierarchy Network for Unconstrained Gait Recognition 8 hours ago | arxiv.org

abstract advances arxiv challenges +12

Image-Based Virtual Try-On: A Survey 8 hours ago | arxiv.org

arxiv cs.cv image survey +3

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Business Data Scientist, gTech Ads

@ Google | Mexico City, CDMX, Mexico

View on ai-jobs.net

Lead, Data Analytics Operations

@ Zocdoc | Pune, Maharashtra, India

View on ai-jobs.net