WAVE: Weight Template for Adaptive Initialization of Variable-sized Models | allainews.com

June 26, 2024, 4:45 a.m. | Fu Feng, Yucheng Xie, Jing Wang, Xin Geng

cs.LG updates on arXiv.org arxiv.org

arXiv:2406.17503v1 Announce Type: new
Abstract: The expansion of model parameters underscores the significance of pre-trained models; however, the constraints encountered during model deployment necessitate models of variable sizes. Consequently, the traditional pre-training and fine-tuning paradigm fails to address the initialization problem when target models are incompatible with pre-trained models. We tackle this issue from a multitasking perspective and introduce \textbf{WAVE}, which incorporates a set of shared \textbf{W}eight templates for \textbf{A}daptive initialization of \textbf{V}ariable-siz\textbf{E}d Models. During initialization, target models will initialize …

abstract arxiv constraints cs.lg deployment expansion fine-tuning however model deployment paradigm parameters pre-trained models pre-training problem significance template training tuning type

More from arxiv.org / cs.LG updates on arXiv.org

Bayesian identification of nonseparable Hamiltonians with multiplicative noise using deep learning and reduced-order modeling 1 day, 2 hours ago | arxiv.org

abstract arxiv bayesian cs.lg +17

MMGPL: Multimodal Medical Data Analysis with Graph Prompt Learning 1 day, 2 hours ago | arxiv.org

abstract analysis arxiv cs.cv +16

Self-Supervised Detection of Perfect and Partial Input-Dependent Symmetries 1 day, 2 hours ago | arxiv.org

arxiv cs.cv cs.lg detection +3

MixerFlow: MLP-Mixer meets Normalising Flows 1 day, 2 hours ago | arxiv.org

abstract architectures arxiv context +15

Machine Learning-Enabled Software and System Architecture Frameworks 1 day, 2 hours ago | arxiv.org

abstract architecture arxiv concerns +22

Efficient Interaction-Aware Interval Analysis of Neural Network Feedback Loops 1 day, 2 hours ago | arxiv.org

abstract analysis arxiv cs.lg +19

Kernelised Normalising Flows 1 day, 2 hours ago | arxiv.org

abstract architecture arxiv capabilities +14

GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism 1 day, 2 hours ago | arxiv.org

abstract arxiv class cs.dc +25

Reinforcement Learning in Credit Scoring and Underwriting 1 day, 2 hours ago | arxiv.org

abstract action adapt arxiv +17

Performance Marketing Manager

@ Jerry | New York City

View on ai-jobs.net

Senior Growth Marketing Manager (FULLY REMOTE)

@ Jerry | Seattle, WA

View on ai-jobs.net

Growth Marketing Channel Manager

@ Jerry | New York City

View on ai-jobs.net

Azure Integration Developer - Consultant - Bangalore

@ KPMG India | Bengaluru, Karnataka, India

View on ai-jobs.net

Director - Technical Program Manager

@ Capital One | Bengaluru, In

View on ai-jobs.net

Lead Developer-Process Automation -Python Developer

@ Diageo | Bengaluru Karle Town SEZ

View on ai-jobs.net