Web: http://arxiv.org/abs/2112.09747

Sept. 20, 2022, 1:13 a.m. | Wuyang Chen, Xianzhi Du, Fan Yang, Lucas Beyer, Xiaohua Zhai, Tsung-Yi Lin, Huizhong Chen, Jing Li, Xiaodan Song, Zhangyang Wang, Denny Zhou

cs.CV updates on arXiv.org arxiv.org

This work presents a simple vision transformer design as a strong baseline
for object localization and instance segmentation tasks. Transformers recently
demonstrate competitive performance in image classification tasks. To adopt ViT
to object detection and dense prediction tasks, many works inherit the
multistage design from convolutional networks and highly customized ViT
architectures. Behind this design, the goal is to pursue a better trade-off
between computational cost and effective aggregation of multiscale global
contexts. However, existing works adopt the multistage architectural …

arxiv localization scale segmentation transformer vision

More from arxiv.org / cs.CV updates on arXiv.org

Research Scientists

@ ODU Research Foundation | Norfolk, Virginia

Embedded Systems Engineer (Robotics)

@ Neo Cybernetica | Bedford, New Hampshire

2023 Luis J. Alvarez and Admiral Grace M. Hopper Postdoc Fellowship in Computing Sciences

@ Lawrence Berkeley National Lab | San Francisco, CA

Senior Manager Data Scientist

@ NAV | Remote, US

Senior AI Research Scientist

@ Earth Species Project | Remote anywhere

Research Fellow- Center for Security and Emerging Technology (Multiple Opportunities)

@ University of California Davis | Washington, DC

Staff Fellow - Data Scientist

@ U.S. FDA/Center for Devices and Radiological Health | Silver Spring, Maryland

Staff Fellow - Senior Data Engineer

@ U.S. FDA/Center for Devices and Radiological Health | Silver Spring, Maryland

Tech Business Data Analyst

@ Fivesky | Alpharetta, GA

Senior Applied Scientist

@ Amazon.com | London, England, GBR

AI Researcher (Junior/Mid-level)

@ Charles River Analytics Inc. | Cambridge, MA

Data Engineer - Machine Learning & AI

@ Calabrio | Minneapolis, Minnesota, United States