all AI news
Incorporating Language-Driven Appearance Knowledge Units with Visual Cues in Pedestrian Detection. (arXiv:2311.01025v1 [cs.CV])
cs.CV updates on arXiv.org arxiv.org
Large language models (LLMs) have shown their capability in understanding
contextual and semantic information regarding appearance knowledge of
instances. In this paper, we introduce a novel approach to utilize the strength
of an LLM in understanding contextual appearance variations and to leverage its
knowledge into a vision model (here, pedestrian detection). While pedestrian
detection is considered one of crucial tasks directly related with our safety
(e.g., intelligent driving system), it is challenging because of varying
appearances and poses in diverse …
arxiv capability detection information instances knowledge language language models large language large language models llm llms novel paper pedestrian semantic understanding units vision visual visual cues