April 16, 2024, 4:44 a.m. | Shen-Yi Zhao, Chang-Wei Shi, Yin-Peng Xie, Wu-Jun Li

cs.LG updates on arXiv.org arxiv.org

arXiv:2007.13985v2 Announce Type: replace-cross
Abstract: Stochastic gradient descent~(SGD) and its variants have been the dominating optimization methods in machine learning. Compared to SGD with small-batch training, SGD with large-batch training can better utilize the computational power of current multi-core systems such as graphics processing units~(GPUs) and can reduce the number of communication rounds in distributed training settings. Thus, SGD with large-batch training has attracted considerable attention. However, existing empirical results showed that large-batch training typically leads to a drop in …

abstract arxiv computational core cs.lg current gpus gradient graphics graphics processing units machine machine learning optimization power processing reduce small stat.ml stochastic systems training type units variants

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US