March 26, 2024, 4:49 a.m. | Daniela Massiceti, Camilla Longden, Agnieszka S{\l}owik, Samuel Wills, Martin Grayson, Cecily Morrison

cs.CV updates on arXiv.org arxiv.org

arXiv:2311.17315v3 Announce Type: replace
Abstract: Large multi-modal models (LMMs) hold the potential to usher in a new era of automated visual assistance for people who are blind or low vision (BLV). Yet, these models have not been systematically evaluated on data captured by BLV users. We address this by empirically assessing CLIP, a widely-used LMM likely to underpin many assistive technologies. Testing 25 CLIP variants in a zero-shot classification task, we find that their accuracy is 15 percentage points lower …

abstract arxiv automated blind clip cs.cv data lmms low modal multi-modal people performance s performance type vision visual

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Field Sample Specialist (Air Sampling) - Eurofins Environment Testing – Pueblo, CO

@ Eurofins | Pueblo, CO, United States

Camera Perception Engineer

@ Meta | Sunnyvale, CA