Feb. 21, 2024, 5:41 a.m. | Akash Guna R. T, Arnav Chavan, Deepak Gupta

cs.LG updates on arXiv.org arxiv.org

arXiv:2402.12418v1 Announce Type: new
Abstract: Conventional scaling of neural networks typically involves designing a base network and growing different dimensions like width, depth, etc. of the same by some predefined scaling factors. We introduce an automated scaling approach leveraging second-order loss landscape information. Our method is flexible towards skip connections a mainstay in modern vision transformers. Our training-aware method jointly scales and trains transformers without additional training iterations. Motivated by the hypothesis that not all neurons need uniform depth complexity, …

abstract architectures arxiv automated beyond cs.ai cs.lg cs.ne designing dimensions etc information landscape loss network networks neural architectures neural networks scaling type uniform

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Data Scientist

@ ITE Management | New York City, United States