March 20, 2024, 4:45 a.m. | Yixuan Wu, Yizhou Wang, Shixiang Tang, Wenhao Wu, Tong He, Wanli Ouyang, Jian Wu, Philip Torr

cs.CV updates on arXiv.org arxiv.org

arXiv:2403.12488v1 Announce Type: new
Abstract: We present DetToolChain, a novel prompting paradigm, to unleash the zero-shot object detection ability of multimodal large language models (MLLMs), such as GPT-4V and Gemini. Our approach consists of a detection prompting toolkit inspired by high-precision detection priors and a new Chain-of-Thought to implement these prompts. Specifically, the prompts in the toolkit are designed to guide the MLLM to focus on regional information (e.g., zooming in), read coordinates according to measure standards (e.g., overlaying rulers …

abstract arxiv cs.ai cs.cv detection gemini gpt gpt-4v language language models large language large language models mllm mllms multimodal novel object paradigm precision prompting thought toolkit type zero-shot

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

@ HelloBetter | Remote

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena

Seeking Developers and Engineers for AI T-Shirt Generator Project

@ Chevon Hicks | Remote

Director, Venture Capital - Artificial Intelligence

@ Condé Nast | San Jose, CA

Senior Molecular Imaging Expert (Senior Principal Scientist)

@ University of Sydney | Cambridge (USA)