all AI news
OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network
Feb. 27, 2024, 5:47 a.m. | Tiancheng Zhao, Peng Liu, Kyusong Lee
cs.CV updates on arXiv.org arxiv.org
Abstract: The advancement of object detection (OD) in open-vocabulary and open-world scenarios is a critical challenge in computer vision. This work introduces OmDet, a novel language-aware object detection architecture, and an innovative training mechanism that harnesses continual learning and multi-dataset vision-language pre-training. Leveraging natural language as a universal knowledge representation, OmDet accumulates a "visual vocabulary" from diverse datasets, unifying the task as a language-conditioned detection framework. Our multimodal detection network (MDN) overcomes the challenges of multi-dataset …
abstract advancement architecture arxiv challenge computer computer vision continual cs.cl cs.cv dataset detection language multimodal natural natural language network novel open-world pre-training scale training type vision work world
More from arxiv.org / cs.CV updates on arXiv.org
Retrieval-Augmented Egocentric Video Captioning
2 days, 23 hours ago |
arxiv.org
Mirror-Aware Neural Humans
2 days, 23 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US