March 15, 2024, 4:46 a.m. | Meng Chu, Zhedong Zheng, Wei Ji, Tingyu Wang, Tat-Seng Chua

cs.CV updates on arXiv.org arxiv.org

arXiv:2311.12751v2 Announce Type: replace
Abstract: Navigating drones through natural language commands remains challenging due to the dearth of accessible multi-modal datasets and the stringent precision requirements for aligning visual and textual data. To address this pressing need, we introduce GeoText-1652, a new natural language-guided geo-localization benchmark. This dataset is systematically constructed through an interactive human-computer process leveraging Large Language Model (LLM) driven annotation techniques in conjunction with pre-trained vision models. GeoText-1652 extends the established University-1652 image dataset with spatial-aware text …

abstract arxiv benchmark cs.cv cs.mm data dataset datasets drones geo language localization modal multi-modal natural natural language precision requirements spatial textual through type visual

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York