all AI news
The Pyramid of Captions
May 2, 2024, 4:44 a.m. | Delong Chen, Samuel Cahyawijaya, Etsuko Ishii, Ho Shu Chan, Yejin Bang, Pascale Fung
cs.CV updates on arXiv.org arxiv.org
Abstract: We introduce a formal information-theoretic framework for image captioning by regarding it as a representation learning task. Our framework defines three key objectives: task sufficiency, minimal redundancy, and human interpretability. Building upon this foundation, we propose a novel Pyramid of Captions (PoCa) method, which constructs caption pyramids by generating localized captions for zoomed-in image patches and integrating them with global caption information using large language models. This approach leverages intuition that the detailed examination of …
abstract arxiv building captioning captions cs.cv foundation framework human image information interpretability key novel pyramid redundancy representation representation learning type
More from arxiv.org / cs.CV updates on arXiv.org
Retrieval-Augmented Egocentric Video Captioning
2 days, 20 hours ago |
arxiv.org
Mirror-Aware Neural Humans
2 days, 20 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US