all AI news
DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM
March 20, 2024, 4:45 a.m. | Yixuan Wu, Yizhou Wang, Shixiang Tang, Wenhao Wu, Tong He, Wanli Ouyang, Jian Wu, Philip Torr
cs.CV updates on arXiv.org arxiv.org
Abstract: We present DetToolChain, a novel prompting paradigm, to unleash the zero-shot object detection ability of multimodal large language models (MLLMs), such as GPT-4V and Gemini. Our approach consists of a detection prompting toolkit inspired by high-precision detection priors and a new Chain-of-Thought to implement these prompts. Specifically, the prompts in the toolkit are designed to guide the MLLM to focus on regional information (e.g., zooming in), read coordinates according to measure standards (e.g., overlaying rulers …
abstract arxiv cs.ai cs.cv detection gemini gpt gpt-4v language language models large language large language models mllm mllms multimodal novel object paradigm precision prompting thought toolkit type zero-shot
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
Senior Machine Learning Engineer
@ GPTZero | Toronto, Canada
ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)
@ HelloBetter | Remote
Doctoral Researcher (m/f/div) in Automated Processing of Bioimages
@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena
Seeking Developers and Engineers for AI T-Shirt Generator Project
@ Chevon Hicks | Remote
Director, Venture Capital - Artificial Intelligence
@ Condé Nast | San Jose, CA
Senior Molecular Imaging Expert (Senior Principal Scientist)
@ University of Sydney | Cambridge (USA)