all AI news
DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM
March 20, 2024, 4:45 a.m. | Yixuan Wu, Yizhou Wang, Shixiang Tang, Wenhao Wu, Tong He, Wanli Ouyang, Jian Wu, Philip Torr
cs.CV updates on arXiv.org arxiv.org
Abstract: We present DetToolChain, a novel prompting paradigm, to unleash the zero-shot object detection ability of multimodal large language models (MLLMs), such as GPT-4V and Gemini. Our approach consists of a detection prompting toolkit inspired by high-precision detection priors and a new Chain-of-Thought to implement these prompts. Specifically, the prompts in the toolkit are designed to guide the MLLM to focus on regional information (e.g., zooming in), read coordinates according to measure standards (e.g., overlaying rulers …
abstract arxiv cs.ai cs.cv detection gemini gpt gpt-4v language language models large language large language models mllm mllms multimodal novel object paradigm precision prompting thought toolkit type zero-shot
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Sr. VBI Developer II
@ Atos | Texas, US, 75093
Wealth Management - Data Analytics Intern/Co-op Fall 2024
@ Scotiabank | Toronto, ON, CA