DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models | allainews.com

March 5, 2024, 2:43 p.m. | Wei He, Kai Han, Yehui Tang, Chengcheng Wang, Yujie Yang, Tianyu Guo, Yunhe Wang

cs.LG updates on arXiv.org arxiv.org

arXiv:2403.00818v1 Announce Type: cross
Abstract: Large language models (LLMs) face a daunting challenge due to the excessive computational and memory requirements of the commonly used Transformer architecture. While state space model (SSM) is a new type of foundational network architecture offering lower computational complexity, their performance has yet to fully rival that of Transformers. This paper introduces DenseSSM, a novel approach to enhance the flow of hidden information between layers in SSMs. By selectively integrating shallowlayer hidden states into deeper …

abstract architecture arxiv challenge complexity computational cs.cl cs.lg face hidden language language models large language large language models llms memory network network architecture performance requirements space state transformer transformer architecture type

More from arxiv.org / cs.LG updates on arXiv.org

CascadedGaze: Efficiency in Global Context Extraction for Image Restoration 12 hours ago | arxiv.org

abstract arxiv attention attention mechanisms +23

Link Me Baby One More Time: Social Music Discovery on Spotify 12 hours ago | arxiv.org

abstract arxiv baby cs.ir +15

Risk-anticipatory autonomous driving strategies considering vehicles' weights, based on hierarchical deep reinforcement learning 12 hours ago | arxiv.org

abstract accidents arxiv autonomous +20

An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models 12 hours ago | arxiv.org

abstract annotation arxiv capabilities +21

Toward Deep Drum Source Separation 12 hours ago | arxiv.org

abstract adoption applications arxiv +14

CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor 12 hours ago | arxiv.org

abstract arxiv capacity clip +21

Towards Optimal Sobolev Norm Rates for the Vector-Valued Regularized Least-Squares Algorithm 12 hours ago | arxiv.org

abstract algorithm arxiv case +14

Learning Noise-Robust Joint Representation for Multimodal Emotion Recognition under Incomplete Data Scenarios 12 hours ago | arxiv.org

abstract arxiv challenges cs.ai +15

SySMOL: Co-designing Algorithms and Hardware for Neural Networks with Heterogeneous Precisions 12 hours ago | arxiv.org

abstract accuracy algorithms arxiv +14

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net