Nov. 2, 2022, 1:11 a.m. | Masud An-Nur Islam Fahim, Jani Boutellier

cs.LG updates on arXiv.org arxiv.org

Methods for improving deep neural network training times and model
generalizability consist of various data augmentation, regularization, and
optimization approaches, which tend to be sensitive to hyperparameter settings
and make reproducibility more challenging. This work jointly considers two
recent training strategies that address model generalizability: sharpness-aware
minimization, and self-distillation, and proposes the novel training strategy
of Sharpness-Aware Distilled Teachers (SADT). The experimental section of this
work shows that SADT consistently outperforms previously published training
strategies in model convergence time, test-time …

arxiv distillation model generalization

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US