Web: http://arxiv.org/abs/2201.12250

Jan. 31, 2022, 2:11 a.m. | Frederik Benzing

cs.LG updates on arXiv.org arxiv.org

Second-order optimizers are thought to hold the potential to speed up neural
network training, but due to the enormous size of the curvature matrix, they
typically require approximations to be computationally tractable. The most
successful family of approximations are Kronecker-Factored, block-diagonal
curvature estimates (KFAC). Here, we combine tools from prior work to evaluate
exact second-order updates with careful ablations to establish a surprising
result: Due to its approximations, KFAC is not closely related to second-order
updates, and in particular, it …

arxiv gradient neurons optimization

