all AI news
Finer: Investigating and Enhancing Fine-Grained Visual Concept Recognition in Large Vision Language Models
Feb. 27, 2024, 5:47 a.m. | Jeonghwan Kim, Heng Ji
cs.CV updates on arXiv.org arxiv.org
Abstract: Recent advances in instruction-tuned Large Vision-Language Models (LVLMs) have imbued the models with the ability to generate high-level, image-grounded explanations with ease. While such capability is largely attributed to the rich world knowledge contained within the Large Language Models (LLMs), our work reveals their shortcomings in fine-grained visual categorization (FGVC) across six different benchmark settings. Most recent state-of-the-art LVLMs like LLaVa-1.5, InstructBLIP and GPT-4V not only severely deteriorate in terms of classification performance, e.g., average …
abstract advances arxiv capability concept cs.cl cs.cv fine-grained generate image instruction-tuned knowledge language language models large language large language models llms recognition type vision vision-language models visual work world
More from arxiv.org / cs.CV updates on arXiv.org
Retrieval-Augmented Egocentric Video Captioning
2 days, 16 hours ago |
arxiv.org
Mirror-Aware Neural Humans
2 days, 16 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US