all AI news
Layer-wise Regularized Dropout for Neural Language Models
Feb. 27, 2024, 5:50 a.m. | Shiwen Ni, Min Yang, Ruifeng Xu, Chengming Li, Xiping Hu
cs.CL updates on arXiv.org arxiv.org
Abstract: Among the various pre-trained neural language models that are popular today, dropout is already an indispensable regularization technique. To solve the inconsistency between training and inference caused by the randomness of dropout, some studies use consistency training to regularize dropout at the output layer. In this paper, we propose a novel Layer-wise Regularized Dropout (LR-Drop), which is specially designed for Transformer-based Language models. Specifically, LR-Drop layer-wise regularizes each Transformer layer using the consistency training strategy. …
abstract arxiv cs.ai cs.cl dropout inference language language models layer paper popular randomness regularization solve studies training type wise
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne