March 26, 2024, 4:49 a.m. | Daniela Massiceti, Camilla Longden, Agnieszka S{\l}owik, Samuel Wills, Martin Grayson, Cecily Morrison

cs.CV updates on arXiv.org arxiv.org

arXiv:2311.17315v3 Announce Type: replace
Abstract: Large multi-modal models (LMMs) hold the potential to usher in a new era of automated visual assistance for people who are blind or low vision (BLV). Yet, these models have not been systematically evaluated on data captured by BLV users. We address this by empirically assessing CLIP, a widely-used LMM likely to underpin many assistive technologies. Testing 25 CLIP variants in a zero-shot classification task, we find that their accuracy is 15 percentage points lower …

abstract arxiv automated blind clip cs.cv data lmms low modal multi-modal people performance s performance type vision visual

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US