Feb. 13, 2024, 5:42 a.m. | Rudrajit Das Naman Agarwal Sujay Sanghavi Inderjit S. Dhillon

cs.LG updates on arXiv.org arxiv.org

There is a notable dearth of results characterizing the preconditioning effect of Adam and showing how it may alleviate the curse of ill-conditioning -- an issue plaguing gradient descent (GD). In this work, we perform a detailed analysis of Adam's preconditioning effect for quadratic functions and quantify to what extent Adam can mitigate the dependence on the condition number of the Hessian. Our key finding is that Adam can suffer less from the condition number but at the expense of …

