Feb. 19, 2024, 5:41 a.m. | Hyeonsu Jeong, Hye Won Chung

cs.LG updates on arXiv.org arxiv.org

arXiv:2402.10482v1 Announce Type: new
Abstract: Self-distillation (SD) is the process of training a student model using the outputs of a teacher model, with both models sharing the same architecture. Our study theoretically examines SD in multi-class classification with cross-entropy loss, exploring both multi-round SD and SD with refined teacher outputs, inspired by partial label learning (PLL). By deriving a closed-form solution for the student model's outputs, we discover that SD essentially functions as label averaging among instances with high feature …

abstract architecture arxiv class classification cross-entropy cs.lg distillation entropy loss noise process stat.ml study training type understanding

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne