StableMask: Refining Causal Masking in Decoder-only Transformer | allainews.com

Feb. 8, 2024, 5:46 a.m. | Qingyu Yin Xuzheng He Xiang Zhuang Yu Zhao Jianhua Yao Xiaoyu Shen Qiang Zhang

cs.CL updates on arXiv.org arxiv.org

The decoder-only Transformer architecture with causal masking and relative position encoding (RPE) has become the de facto choice in language modeling. Despite its exceptional performance across various tasks, we have identified two limitations: First, it requires all attention scores to be non-zero and sum up to 1, even if the current embedding has sufficient self-contained information. This compels the model to assign disproportional excessive attention to specific tokens. Second, RPE-based Transformers are not universal approximators due to their limited capacity …

architecture attention become cs.ai cs.cl current decoder embedding encoding language limitations masking modeling performance tasks the decoder transformer transformer architecture

More from arxiv.org / cs.CL updates on arXiv.org

Designing LLM Chains by Adapting Techniques from Crowdsourcing Workflows 15 hours ago | arxiv.org

abstract arxiv crowdsourcing cs.ai +13

GraphGPT: Graph Instruction Tuning for Large Language Models 15 hours ago | arxiv.org

arxiv cs.ai cs.cl graph +6

How Fragile is Relation Extraction under Entity Replacements? 15 hours ago | arxiv.org

arxiv cs.ai cs.cl extraction +1

Granite Code Models: A Family of Open Foundation Models for Code Intelligence 15 hours ago | arxiv.org

abstract agents arxiv code +25

Enriched BERT Embeddings for Scholarly Publication Classification 15 hours ago | arxiv.org

abstract academic articles arxiv +16

Sketch Then Generate: Providing Incremental User Feedback and Guiding LLM Code Generation through Language-Oriented Code … 15 hours ago | arxiv.org

abstract arxiv code code generation +20

HAFFormer: A Hierarchical Attention-Free Framework for Alzheimer's Disease Detection From Spontaneous Speech 15 hours ago | arxiv.org

abstract alzheimer's architectures arxiv +22

CleanGraph: Human-in-the-loop Knowledge Graph Refinement and Completion 15 hours ago | arxiv.org

arxiv cs.ai cs.cl graph +5

Conformity, Confabulation, and Impersonation: Persona Inconstancy in Multi-Agent LLM Collaboration 15 hours ago | arxiv.org

abstract agent agents analyze +19

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net