all AI news
Faster Convergence for Transformer Fine-tuning with Line Search Methods
March 28, 2024, 4:41 a.m. | Philip Kenneweg, Leonardo Galli, Tristan Kenneweg, Barbara Hammer
cs.LG updates on arXiv.org arxiv.org
Abstract: Recent works have shown that line search methods greatly increase performance of traditional stochastic gradient descent methods on a variety of datasets and architectures [1], [2]. In this work we succeed in extending line search methods to the novel and highly popular Transformer architecture and dataset domains in natural language processing. More specifically, we combine the Armijo line search with the Adam optimizer and extend it by subdividing the networks architecture into sensible units and …
abstract architecture architectures arxiv convergence cs.ai cs.lg dataset datasets faster fine-tuning gradient line novel performance popular search stochastic transformer transformer architecture type work
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Principal Data Engineering Manager
@ Microsoft | Redmond, Washington, United States
Machine Learning Engineer
@ Apple | San Diego, California, United States