all AI news
Multimodal Transformer With a Low-Computational-Cost Guarantee
Feb. 26, 2024, 5:41 a.m. | Sungjin Park, Edward Choi
cs.LG updates on arXiv.org arxiv.org
Abstract: Transformer-based models have significantly improved performance across a range of multimodal understanding tasks, such as visual question answering and action recognition. However, multimodal Transformers significantly suffer from a quadratic complexity of the multi-head attention with the input sequence length, especially as the number of modalities increases. To address this, we introduce Low-Cost Multimodal Transformer (LoCoMT), a novel multimodal attention mechanism that aims to reduce computational cost during training and inference with minimal performance loss. Specifically, …
abstract action recognition arxiv attention complexity computational cost cs.cv cs.lg cs.mm head low multi-head multi-head attention multimodal performance question question answering recognition tasks transformer transformers type understanding visual
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Data Analyst (Digital Business Analyst)
@ Activate Interactive Pte Ltd | Singapore, Central Singapore, Singapore