all AI news
AD-DROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning. (arXiv:2210.05883v1 [cs.CL])
cs.CL updates on arXiv.org arxiv.org
Fine-tuning large pre-trained language models on downstream tasks is apt to
suffer from overfitting when limited training data is available. While dropout
proves to be an effective antidote by randomly dropping a proportion of units,
existing research has not examined its effect on the self-attention mechanism.
In this paper, we investigate this problem through self-attention attribution
and find that dropping attention positions with low attribution scores can
accelerate training and increase the risk of overfitting. Motivated by this
observation, we …
arxiv attribution dropout fine-tuning language language model model fine-tuning