Feb. 8, 2024, 5:47 a.m. | Ziyang Wang Jian-Qing Zheng Yichi Zhang Ge Cui Lei Li

cs.CV updates on arXiv.org arxiv.org

In recent advancements in medical image analysis, Convolutional Neural Networks (CNN) and Vision Transformers (ViT) have set significant benchmarks. While the former excels in capturing local features through its convolution operations, the latter achieves remarkable global context understanding by leveraging self-attention mechanisms. However, both architectures exhibit limitations in efficiently modeling long-range dependencies within medical images, which is a critical aspect for precise segmentation. Inspired by the Mamba architecture, known for its proficiency in handling long sequences and global contextual information …

analysis architectures attention attention mechanisms benchmarks cnn context convolution convolutional neural networks cs.cv eess.iv features global image limitations mamba medical modeling networks neural networks operations segmentation self-attention set through transformers understanding unet vision vision transformers visual vit

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

AI Engineering Manager

@ M47 Labs | Barcelona, Catalunya [Cataluña], Spain