Aug. 11, 2023, 6:45 a.m. | Ajay Jaiswal, Shiwei Liu, Tianlong Chen, Zhangyang Wang

cs.LG updates on arXiv.org arxiv.org

Large pre-trained transformers are show-stealer in modern-day deep learning,
and it becomes crucial to comprehend the parsimonious patterns that exist
within them as they grow in scale. With exploding parameter counts, Lottery
Ticket Hypothesis (LTH) and its variants, have lost their pragmatism in
sparsifying them due to high computation and memory bottleneck of repetitive
train-prune-retrain routine of iterative magnitude pruning (IMP) which worsens
with increasing model size. This paper comprehensively studies induced sparse
patterns across multiple large pre-trained vision and …

arxiv deep learning emergence hypothesis lost lottery ticket hypothesis modern patterns pre-trained models scale show sparsity them transformers variants

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Data Science Analyst

@ Mayo Clinic | AZ, United States

Sr. Data Scientist (Network Engineering)

@ SpaceX | Redmond, WA