March 20, 2024, 4:45 a.m. | Yixuan Wu, Yizhou Wang, Shixiang Tang, Wenhao Wu, Tong He, Wanli Ouyang, Jian Wu, Philip Torr

cs.CV updates on arXiv.org arxiv.org

arXiv:2403.12488v1 Announce Type: new
Abstract: We present DetToolChain, a novel prompting paradigm, to unleash the zero-shot object detection ability of multimodal large language models (MLLMs), such as GPT-4V and Gemini. Our approach consists of a detection prompting toolkit inspired by high-precision detection priors and a new Chain-of-Thought to implement these prompts. Specifically, the prompts in the toolkit are designed to guide the MLLM to focus on regional information (e.g., zooming in), read coordinates according to measure standards (e.g., overlaying rulers …

abstract arxiv cs.ai cs.cv detection gemini gpt gpt-4v language language models large language large language models mllm mllms multimodal novel object paradigm precision prompting thought toolkit type zero-shot

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Sr. VBI Developer II

@ Atos | Texas, US, 75093

Wealth Management - Data Analytics Intern/Co-op Fall 2024

@ Scotiabank | Toronto, ON, CA