Web: http://arxiv.org/abs/2202.06709

May 12, 2022, 1:11 a.m. | Namuk Park, Songkuk Kim

cs.LG updates on arXiv.org arxiv.org

The success of multi-head self-attentions (MSAs) for computer vision is now
indisputable. However, little is known about how MSAs work. We present
fundamental explanations to help better understand the nature of MSAs. In
particular, we demonstrate the following properties of MSAs and Vision
Transformers (ViTs): (1) MSAs improve not only accuracy but also generalization
by flattening the loss landscapes. Such improvement is primarily attributable
to their data specificity, not long-range dependency. On the other hand, ViTs
suffer from non-convex losses. …

arxiv cv transformers vision work

More from arxiv.org / cs.LG updates on arXiv.org

Director, Applied Mathematics & Computational Research Division

@ Lawrence Berkeley National Lab | Berkeley, Ca

Business Data Analyst

@ MainStreet Family Care | Birmingham, AL

Assistant/Associate Professor of the Practice in Business Analytics

@ Georgetown University McDonough School of Business | Washington DC

Senior Data Science Writer

@ NannyML | Remote

Director of AI/ML Engineering

@ Armis Industries | Remote (US only), St. Louis, California

Digital Analytics Manager

@ Patagonia | Ventura, California