April 3, 2024, 4:42 a.m. | Philip Kenneweg, Alexander Schulz, Sarah Schr\"oder, Barbara Hammer

cs.LG updates on arXiv.org arxiv.org

arXiv:2404.01317v1 Announce Type: cross
Abstract: Pretraining language models on large text corpora is a common practice in natural language processing. Fine-tuning of these models is then performed to achieve the best results on a variety of tasks. In this paper, we investigate the problem of catastrophic forgetting in transformer neural networks and question the common practice of fine-tuning with a flat learning rate for the entire network in this context. We perform a hyperparameter optimization process to find learning rate …

abstract arxiv catastrophic forgetting cs.ai cs.cl cs.lg distribution fine-tuning intelligent language language models language processing natural natural language natural language processing paper practice pretraining processing rate reduce results tasks text transformer transformers type

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US