Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning. (arXiv:2206.01843v2 [cs.CV] UPDATED) | allainews.com

Sept. 16, 2022, 1:15 a.m. | Yujia Xie, Luowei Zhou, Xiyang Dai, Lu Yuan, Nguyen Bach, Ce Liu, Michael Zeng

cs.CV updates on arXiv.org arxiv.org

People say, "A picture is worth a thousand words". Then how can we get the
rich information out of the image? We argue that by using visual clues to
bridge large pretrained vision foundation models and language models, we can do
so without any extra cross-modal training. Thanks to the strong zero-shot
capability of foundation models, we start by constructing a rich semantic
representation of the image (e.g., image tags, object attributes / locations,
captions) as a structured textual prompt, …

arxiv captioning image language vision

More from arxiv.org / cs.CV updates on arXiv.org

AV-RIR: Audio-Visual Room Impulse Response Estimation 7 hours ago | arxiv.org

arxiv audio cs.cv cs.sd +3

A Hierarchical Architecture for Neural Materials 7 hours ago | arxiv.org

abstract architecture arxiv cs.cv +8

SegFormer3D: an Efficient Transformer for 3D Medical Image Segmentation 7 hours ago | arxiv.org

arxiv cs.cv image medical +3

NU-Class Net: A Novel Deep Learning-based Approach for Video Quality Enhancement 7 hours ago | arxiv.org

abstract arxiv class compression +18

Mosaic-SDF for 3D Generative Models 7 hours ago | arxiv.org

2d image abstract arxiv cs.cv +14

PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection 7 hours ago | arxiv.org

3d object 3d object detection arxiv cs.cv +6

A Multilevel Guidance-Exploration Network and Behavior-Scene Matching Method for Human Behavior Anomaly Detection 7 hours ago | arxiv.org

anomaly anomaly detection arxiv behavior +7

ChatPose: Chatting about 3D Human Pose 7 hours ago | arxiv.org

abstract arxiv cs.cv framework +14

Boosting Audio-visual Zero-shot Learning with Large Language Models 7 hours ago | arxiv.org

arxiv audio boosting cs.cv +7

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Data Management Assistant

@ World Vision | Amman Office, Jordan

View on ai-jobs.net

Cloud Data Engineer, Global Services Delivery, Google Cloud

@ Google | Buenos Aires, Argentina

View on ai-jobs.net