Feb. 12, 2024, 5:43 a.m. | Stefana Anita Gabriel Turinici

cs.LG updates on arXiv.org arxiv.org

We present a self-contained proof of the convergence rate of the Stochastic Gradient Descent (SGD) when the learning rate follows an inverse time decays schedule; we next apply the results to the convergence of a modified form of policy gradient Multi-Armed Bandit (MAB) with $L2$ regularization.

application apply convergence cs.ai cs.ds cs.lg cs.na form gradient math.na next policy rate stat.ml stochastic

