April 21, 2023, 3:46 a.m. | Chris Mingard

Towards Data Science - Medium towardsdatascience.com

Towards architecture-aware optimisation

TL;DR We’ve derived an optimiser called automatic gradient descent (AGD) that can train ImageNet without hyperparameters. This removes the need for expensive and time-consuming learning rate tuning, selection of learning rate decay schedulers, etc. Our paper can be found here.

I worked on this project with Jeremy Bernstein, Kevin Huang, Navid Azizan and Yisong Yue. See Jeremy’s GitHub for a clean Pytorch implementation, or my GitHub for an experimental version with more features. …

gradient hyperparameter-tuning imagenet machine learning optimization-algorithms python thoughts-and-theory

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Lead Data Modeler

@ Sherwin-Williams | Cleveland, OH, United States