March 28, 2024, 4:41 a.m. | Philip Kenneweg, Leonardo Galli, Tristan Kenneweg, Barbara Hammer

cs.LG updates on arXiv.org arxiv.org

arXiv:2403.18506v1 Announce Type: new
Abstract: Recent works have shown that line search methods greatly increase performance of traditional stochastic gradient descent methods on a variety of datasets and architectures [1], [2]. In this work we succeed in extending line search methods to the novel and highly popular Transformer architecture and dataset domains in natural language processing. More specifically, we combine the Armijo line search with the Adam optimizer and extend it by subdividing the networks architecture into sensible units and …

abstract architecture architectures arxiv convergence cs.ai cs.lg dataset datasets faster fine-tuning gradient line novel performance popular search stochastic transformer transformer architecture type work

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Principal Data Engineering Manager

@ Microsoft | Redmond, Washington, United States

Machine Learning Engineer

@ Apple | San Diego, California, United States