April 21, 2023, 3:46 a.m. | Chris Mingard

Towards Data Science - Medium towardsdatascience.com

Towards architecture-aware optimisation

TL;DR We’ve derived an optimiser called automatic gradient descent (AGD) that can train ImageNet without hyperparameters. This removes the need for expensive and time-consuming learning rate tuning, selection of learning rate decay schedulers, etc. Our paper can be found here.

I worked on this project with Jeremy Bernstein, Kevin Huang, Navid Azizan and Yisong Yue. See Jeremy’s GitHub for a clean Pytorch implementation, or my GitHub for an experimental version with more features. …

gradient hyperparameter-tuning imagenet machine learning optimization-algorithms python thoughts-and-theory

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

@ HelloBetter | Remote

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena

Seeking Developers and Engineers for AI T-Shirt Generator Project

@ Chevon Hicks | Remote

Principal Data Architect - Azure & Big Data

@ MGM Resorts International | Home Office - US, NV

GN SONG MT Market Research Data Analyst 11

@ Accenture | Bengaluru, BDC7A