SAPT: A Shared Attention Framework for Parameter-Efficient Continual Learning of Large Language Models | allainews.com

Feb. 19, 2024, 5:48 a.m. | Weixiang Zhao, Shilong Wang, Yulin Hu, Yanyan Zhao, Bing Qin, Xuanyu Zhang, Qing Yang, Dongliang Xu, Wanxiang Che

cs.CL updates on arXiv.org arxiv.org

arXiv:2401.08295v2 Announce Type: replace
Abstract: The continual learning (CL) ability is vital for deploying large language models (LLMs) in the dynamic world. Existing methods devise the learning module to acquire task-specific knowledge with parameter-efficient tuning (PET) block and the selection module to pick out the corresponding one for the testing input, aiming at handling the challenges of catastrophic forgetting and knowledge transfer in CL. However, these methods tend to address only one of the challenges, ignoring the potential of aligning …

abstract arxiv attention block continual cs.cl dynamic framework knowledge language language models large language large language models llms pet type vital world

More from arxiv.org / cs.CL updates on arXiv.org

Biomedical knowledge graph-optimized prompt generation for large language models 20 hours ago | arxiv.org

abstract arxiv biomedical biomedicine +27

Primacy Effect of ChatGPT 20 hours ago | arxiv.org

arxiv chatgpt cs.ai cs.cl +2

Are Models Trained on Indian Legal Data Fair? 20 hours ago | arxiv.org

abstract advances applications artificial +27

Silver-Tongued and Sundry: Exploring Intersectional Pronouns with ChatGPT 20 hours ago | arxiv.org

abstract agent arxiv chatgpt +13

Exploring the Potential of Conversational AI Support for Agent-Based Social Simulation Model Design 20 hours ago | arxiv.org

abstract agent ai-powered ai systems +21

Robot Detection System 1: Front-Following 20 hours ago | arxiv.org

abstract advantages arxiv cs.cl +14

Refinement of an Epilepsy Dictionary through Human Annotation of Health-related posts on Instagram 20 hours ago | arxiv.org

abstract annotation arxiv biomedical +12

Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Intent Resolution in … 20 hours ago | arxiv.org

abstract arxiv beyond cs.ai +15

From Text to Context: An Entailment Approach for News Stakeholder Classification 20 hours ago | arxiv.org

abstract actors articles arxiv +13

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net