Feb. 27, 2024, 5:50 a.m. | Shiwen Ni, Min Yang, Ruifeng Xu, Chengming Li, Xiping Hu

cs.CL updates on arXiv.org arxiv.org

arXiv:2402.16361v1 Announce Type: new
Abstract: Among the various pre-trained neural language models that are popular today, dropout is already an indispensable regularization technique. To solve the inconsistency between training and inference caused by the randomness of dropout, some studies use consistency training to regularize dropout at the output layer. In this paper, we propose a novel Layer-wise Regularized Dropout (LR-Drop), which is specially designed for Transformer-based Language models. Specifically, LR-Drop layer-wise regularizes each Transformer layer using the consistency training strategy. …

abstract arxiv cs.ai cs.cl dropout inference language language models layer paper popular randomness regularization solve studies training type wise

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne