all AI news
Win-Win: Training High-Resolution Vision Transformers from Two Windows
March 25, 2024, 4:45 a.m. | Vincent Leroy, Jerome Revaud, Thomas Lucas, Philippe Weinzaepfel
cs.CV updates on arXiv.org arxiv.org
Abstract: Transformers have become the standard in state-of-the-art vision architectures, achieving impressive performance on both image-level and dense pixelwise tasks. However, training vision transformers for high-resolution pixelwise tasks has a prohibitive cost. Typical solutions boil down to hierarchical architectures, fast and approximate attention, or training on low-resolution crops. This latter solution does not constrain architectural choices, but it leads to a clear performance drop when testing at resolutions significantly higher than that used for training, thus …
abstract architectures art arxiv attention become cost crops cs.cv hierarchical however image low performance resolution solutions standard state tasks training transformers type vision vision transformers windows
More from arxiv.org / cs.CV updates on arXiv.org
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
2 days, 12 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Research Scientist, Demography and Survey Science, University Grad
@ Meta | Menlo Park, CA | New York City
Computer Vision Engineer, XR
@ Meta | Burlingame, CA