Vision Transformers with Natural Language Semantics | allainews.com

Feb. 29, 2024, 5:45 a.m. | Young Kyung Kim, J. Mat\'ias Di Martino, Guillermo Sapiro

cs.CV updates on arXiv.org arxiv.org

arXiv:2402.17863v1 Announce Type: new
Abstract: Tokens or patches within Vision Transformers (ViT) lack essential semantic information, unlike their counterparts in natural language processing (NLP). Typically, ViT tokens are associated with rectangular image patches that lack specific semantic context, making interpretation difficult and failing to effectively encapsulate information. We introduce a novel transformer model, Semantic Vision Transformers (sViT), which leverages recent progress on segmentation models to design novel tokenizer strategies. sViT effectively harnesses semantic information, creating an inductive bias reminiscent of …

abstract arxiv context cs.cv image information interpretation language language processing making natural natural language natural language processing nlp novel processing semantic semantics tokens transformer transformer model transformers type vision vision transformers vit

More from arxiv.org / cs.CV updates on arXiv.org

CheXmask: a large-scale dataset of anatomical segmentation masks for multi-center chest x-ray images 22 hours ago | arxiv.org

arxiv center cs.cv dataset +10

Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering 22 hours ago | arxiv.org

abstract agent arxiv augment +16

SONIC: Sonar Image Correspondence using Pose Supervised Learning for Imaging Sonars 22 hours ago | arxiv.org

abstract arxiv association cs.cv +18

On Partial Shape Correspondence and Functional Maps 22 hours ago | arxiv.org

abstract apply arxiv cs.cv +10

Hierarchical Side-Tuning for Vision Transformers 22 hours ago | arxiv.org

abstract arxiv challenge computational +18

DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion Models 22 hours ago | arxiv.org

animation arxiv cs.cv cs.gr +7

Local Padding in Patch-Based GANs for Seamless Infinite-Sized Texture Synthesis 22 hours ago | arxiv.org

arxiv cs.cv eess.iv gans +5

Two-stream Multi-level Dynamic Point Transformer for Two-person Interaction Recognition 22 hours ago | arxiv.org

abstract action recognition applications arxiv +21

Intriguing Property and Counterfactual Explanation of GAN for Remote Sensing Image Generation 22 hours ago | arxiv.org

arxiv counterfactual cs.cv eess.iv +7

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net