Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective | allainews.com

May 9, 2024, 4:42 a.m. | Ming Zhong, Chenxin An, Weizhu Chen, Jiawei Han, Pengcheng He

cs.LG updates on arXiv.org arxiv.org

arXiv:2310.11451v2 Announce Type: replace-cross
Abstract: Large Language Models (LLMs) inherently encode a wealth of knowledge within their parameters through pre-training on extensive corpora. While prior research has delved into operations on these parameters to manipulate the underlying implicit knowledge (encompassing detection, editing, and merging), there remains an ambiguous understanding regarding their transferability across models with varying scales. In this paper, we seek to empirically investigate knowledge transfer from larger to smaller models through a parametric perspective. To achieve this, we …

arxiv cs.ai cs.cl cs.lg knowledge language language models large language large language models parametric perspective transfer type

More from arxiv.org / cs.LG updates on arXiv.org

Bypassing the Safety Training of Open-Source LLMs with Priming Attacks an hour ago | arxiv.org

arxiv attacks cs.ai cs.cl +7

Variational Mode Decomposition-Based Nonstationary Coherent Structure Analysis for Spatiotemporal Data an hour ago | arxiv.org

abstract analysis and analysis arxiv +12

Differentially private projection-depth-based medians an hour ago | arxiv.org

abstract arxiv cost cs.cr +19

Unified Binary and Multiclass Margin-Based Classification an hour ago | arxiv.org

abstract algorithms analysis and analysis +15

An Experimental Design for Anytime-Valid Causal Inference on Multi-Armed Bandits an hour ago | arxiv.org

abstract arxiv causal causal inference +12

Convergence of flow-based generative models via proximal gradient descent in Wasserstein space an hour ago | arxiv.org

abstract advantages analysis arxiv +23

Identifying the Risks of LM Agents with an LM-Emulated Sandbox an hour ago | arxiv.org

abstract advances agents amplify +22

Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs an hour ago | arxiv.org

arxiv cs.ai cs.cl cs.lg +6

Robust Online Learning over Networks an hour ago | arxiv.org

abstract agent agents arxiv +25

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net