Feb. 9, 2024, 5:43 a.m. | Yichuan Deng Hang Hu Zhao Song Omri Weinstein Danyang Zhuo

cs.LG updates on arXiv.org arxiv.org

The success of deep learning comes at a tremendous computational and energy cost, and the scalability of training massively overparametrized neural networks is becoming a real barrier to the progress of artificial intelligence (AI). Despite the popularity and low cost-per-iteration of traditional backpropagation via gradient decent, stochastic gradient descent (SGD) has prohibitive convergence rate in non-convex settings, both in theory and practice.
To mitigate this cost, recent works have proposed to employ alternative (Newton-type) training methods with much faster convergence …

artificial artificial intelligence backpropagation computational convergence cost cs.ds cs.lg deep learning energy gradient intelligence iteration low networks neural networks per progress scalability stat.ml stochastic success training via

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne