Compositional Chain-of-Thought Prompting for Large Multimodal Models | allainews.com

April 1, 2024, 4:43 a.m. | Chancharik Mitra, Brandon Huang, Trevor Darrell, Roei Herzig

cs.LG updates on arXiv.org arxiv.org

arXiv:2311.17076v2 Announce Type: replace-cross
Abstract: The combination of strong visual backbones and Large Language Model (LLM) reasoning has led to Large Multimodal Models (LMMs) becoming the current standard for a wide range of vision and language (VL) tasks. However, recent research has shown that even the most advanced LMMs still struggle to capture aspects of compositional visual reasoning, such as attributes and relationships between objects. One solution is to utilize scene graphs (SGs)--a formalization of objects and their relations and …

arxiv cs.ai cs.cl cs.cv cs.lg large multimodal models multimodal multimodal models prompting thought type

More from arxiv.org / cs.LG updates on arXiv.org

Differentially private Bayesian tests 22 hours ago | arxiv.org

abstract arxiv bayesian cs.cr +20

What Are We Optimizing For? A Human-centric Evaluation of Deep Learning-based Movie Recommenders 22 hours ago | arxiv.org

abstract accuracy arxiv benchmark +21

Attention-Enhanced Reservoir Computing 22 hours ago | arxiv.org

abstract accuracy arxiv attention +11

Learning finitely correlated states: stability of the spectral reconstruction 22 hours ago | arxiv.org

abstract arxiv cs.et cs.lg +10

Transfer Learning in Robotics: An Upcoming Breakthrough? A Review of Promises and Challenges 22 hours ago | arxiv.org

abstract agents arxiv challenges +17

The Perception-Robustness Tradeoff in Deterministic Image Restoration 22 hours ago | arxiv.org

abstract arxiv behavior consistent +13

Conformal Decision Theory: Safe Autonomous Decisions from Imperfect Predictions 22 hours ago | arxiv.org

abstract algorithms arxiv autonomous +20

Fin-Fact: A Benchmark Dataset for Multimodal Financial Fact Checking and Explanation Generation 22 hours ago | arxiv.org

arxiv benchmark cs.ai cs.ce +6

TExplain: Explaining Learned Visual Features via Pre-trained (Frozen) Language Models 22 hours ago | arxiv.org

abstract arxiv capabilities challenge +16

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Lead Data Modeler

@ Sherwin-Williams | Cleveland, OH, United States

View on ai-jobs.net