all AI news
GSVA: Generalized Segmentation via Multimodal Large Language Models
March 20, 2024, 4:46 a.m. | Zhuofan Xia, Dongchen Han, Yizeng Han, Xuran Pan, Shiji Song, Gao Huang
cs.CV updates on arXiv.org arxiv.org
Abstract: Generalized Referring Expression Segmentation (GRES) extends the scope of classic RES to refer to multiple objects in one expression or identify the empty targets absent in the image. GRES poses challenges in modeling the complex spatial relationships of the instances in the image and identifying non-existing referents. Multimodal Large Language Models (MLLMs) have recently shown tremendous progress in these complicated vision-language tasks. Connecting Large Language Models (LLMs) and vision models, MLLMs are proficient in understanding …
abstract arxiv challenges cs.cv generalized identify image instances language language models large language large language models modeling multimodal multiple objects relationships segmentation spatial targets type via
More from arxiv.org / cs.CV updates on arXiv.org
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
2 days, 3 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Research Scientist
@ Meta | Menlo Park, CA
Principal Data Scientist
@ Mastercard | O'Fallon, Missouri (Main Campus)