Using Left and Right Brains Together: Towards Vision and Language Planning | allainews.com

Feb. 19, 2024, 5:45 a.m. | Jun Cen, Chenfei Wu, Xiao Liu, Shengming Yin, Yixuan Pei, Jinglong Yang, Qifeng Chen, Nan Duan, Jianguo Zhang

cs.CV updates on arXiv.org arxiv.org

arXiv:2402.10534v1 Announce Type: new
Abstract: Large Language Models (LLMs) and Large Multi-modality Models (LMMs) have demonstrated remarkable decision masking capabilities on a variety of tasks. However, they inherently operate planning within the language space, lacking the vision and spatial imagination ability. In contrast, humans utilize both left and right hemispheres of the brain for language and visual planning during the thinking process. Therefore, we introduce a novel vision-language planning framework in this work to perform concurrent visual and language planning …

abstract arxiv brains capabilities contrast cs.cv decision humans imagination language language models large language large language models llms lmms masking planning space spatial tasks together type vision

More from arxiv.org / cs.CV updates on arXiv.org

Validating polyp and instrument segmentation methods in colonoscopy through Medico 2020 and MedAI 2021 Challenges 4 minutes ago | arxiv.org

abstract analysis arxiv challenges +11

ReFACT: Updating Text-to-Image Models by Editing the Text Encoder 4 minutes ago | arxiv.org

abstract arxiv become challenge +17

Yuille-Poggio's Flow and Global Minimizer of Polynomials through Convexification by Heat Evolution 4 minutes ago | arxiv.org

abstract algorithm arxiv cs.cv +9

Motion State: A New Benchmark Multiple Object Tracking 4 minutes ago | arxiv.org

abstract analysis arxiv benchmark +18

Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering 4 minutes ago | arxiv.org

arxiv convolutional cs.ai cs.cv +10

A Unified Approach for Text- and Image-guided 4D Scene Generation 4 minutes ago | arxiv.org

3d scene generation abstract arxiv cs.cv +17

From Pixels to Titles: Video Game Identification by Screenshots using Convolutional Neural Networks 4 minutes ago | arxiv.org

abstract architectures arxiv cnn +24

Amodal Optical Flow 4 minutes ago | arxiv.org

arxiv cs.ai cs.cv cs.ro +4

Interpretable Geoscience Artificial Intelligence (XGeoS-AI): Application to Demystify Image Recognition 4 minutes ago | arxiv.org

abstract ai models application artificial +21

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net