Feb. 28, 2024, 5:42 a.m. | Wuyang Chen, Junru Wu, Zhangyang Wang, Boris Hanin

cs.LG updates on arXiv.org arxiv.org

arXiv:2402.17440v1 Announce Type: new
Abstract: Training a high-quality deep neural network requires choosing suitable hyperparameters, which is a non-trivial and expensive process. Current works try to automatically optimize or design principles of hyperparameters, such that they can generalize to diverse unseen scenarios. However, most designs or optimization methods are agnostic to the choice of network structures, and thus largely ignore the impact of neural architectures on hyperparameters. In this work, we precisely characterize the dependence of initializations and maximal learning …

abstract architecture arxiv cs.lg current deep neural network design designs diverse network neural network optimization process quality scaling training type

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US