Feb. 13, 2024, 5:42 a.m. | Rudrajit Das Naman Agarwal Sujay Sanghavi Inderjit S. Dhillon

cs.LG updates on arXiv.org arxiv.org

There is a notable dearth of results characterizing the preconditioning effect of Adam and showing how it may alleviate the curse of ill-conditioning -- an issue plaguing gradient descent (GD). In this work, we perform a detailed analysis of Adam's preconditioning effect for quadratic functions and quantify to what extent Adam can mitigate the dependence on the condition number of the Hessian. Our key finding is that Adam can suffer less from the condition number but at the expense of …

adam analysis cs.lg functions gradient issue math.oc stat.ml work

Research Scholar (Technical Research)

@ Centre for the Governance of AI | Hybrid; Oxford, UK

HPC Engineer (x/f/m) - DACH

@ Meshcapade GmbH | Remote, Germany

Data Architect

@ Dyson | India - Bengaluru IT Capability Centre

GTM Operation and Marketing Data Analyst

@ DataVisor | Toronto, Ontario, Canada - Remote

Associate - Strategy & Business Intelligence

@ Hitachi | (HE)Office Rotterdam

Senior Executive - Data Analysis

@ Publicis Groupe | Beirut, Lebanon