March 19, 2024, 4:49 a.m. | Weiran Chen, Xin Li, Jiaqi Su, Guiqian Zhu, Ying Li, Yi Ji, Chunping Liu

cs.CV updates on arXiv.org arxiv.org

arXiv:2403.11550v1 Announce Type: new
Abstract: As a cross-modal task, visual storytelling aims to generate a story for an ordered image sequence automatically. Different from the image captioning task, visual storytelling requires not only modeling the relationships between objects in the image but also mining the connections between adjacent images. Recent approaches primarily utilize either end-to-end frameworks or multi-stage frameworks to generate relevant stories, but they usually overlook latent topic information. In this paper, in order to generate a more coherent …

abstract arxiv captioning cs.cv generate image images mining modal modeling network objects reinforcement relationships story storytelling type visual

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Field Sample Specialist (Air Sampling) - Eurofins Environment Testing – Pueblo, CO

@ Eurofins | Pueblo, CO, United States

Camera Perception Engineer

@ Meta | Sunnyvale, CA