June 25, 2024, 4:50 a.m. | Ying Wang, Tim G. J. Rudner, Andrew Gordon Wilson

cs.LG updates on arXiv.org arxiv.org

arXiv:2312.17174v2 Announce Type: replace-cross
Abstract: Vision-language pretrained models have seen remarkable success, but their application to safety-critical settings is limited by their lack of interpretability. To improve the interpretability of vision-language models such as CLIP, we propose a multi-modal information bottleneck (M2IB) approach that learns latent representations that compress irrelevant information while preserving relevant visual and textual features. We demonstrate how M2IB can be applied to attribution analysis of vision-language pretrained models, increasing attribution accuracy and improving the interpretability of …

abstract application arxiv attribution clip cs.ai cs.cv cs.lg image information interpretability language language models modal multi multi-modal pretrained models replace safety safety-critical success text type via vision vision-language vision-language models visual

AI Focused Biochemistry Postdoctoral Fellow

@ Lawrence Berkeley National Lab | Berkeley, CA

Senior Quality Specialist - JAVA

@ SAP | Bengaluru, IN, 560066

Aktuar Financial Lines (m/w/d)

@ Zurich Insurance | Köln, DE

Senior Network Engineer

@ ManTech | 054H - 124TchnlgyPrkWy,SBurlington,VT

Pricing Analyst

@ EDF | Exeter, GB

Specialist IS Engineer

@ Amgen | US - California - Thousand Oaks - Field/Remote