all AI news
The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter. (arXiv:2306.03805v2 [cs.LG] UPDATED)
cs.LG updates on arXiv.org arxiv.org
Large pre-trained transformers are show-stealer in modern-day deep learning,
and it becomes crucial to comprehend the parsimonious patterns that exist
within them as they grow in scale. With exploding parameter counts, Lottery
Ticket Hypothesis (LTH) and its variants, have lost their pragmatism in
sparsifying them due to high computation and memory bottleneck of repetitive
train-prune-retrain routine of iterative magnitude pruning (IMP) which worsens
with increasing model size. This paper comprehensively studies induced sparse
patterns across multiple large pre-trained vision and …
arxiv deep learning emergence hypothesis lost lottery ticket hypothesis modern patterns pre-trained models scale show sparsity them transformers variants