March 17, 2022, 11:17 p.m. | /u/optimized-adam

Machine Learning www.reddit.com

So cross-entropy(`H(p,q)`) and KL-divergence (`KL(p||q)`) relate to each other as follows:

`H(p,q) = KL(p||q) + H(p)` and `KL(p||q) = H(p,q) - H(p)`

where `p` is the data distribution and `q` is the model distribution. When `p` is constant (as is the case in most ML problems), minimizing `H(p,q)` is equivalent to minimizing `KL(p||q)`. However, there seems to be some ambiguity about this. (One practitioner claims)[https://stats.stackexchange.com/a/409271] that there is a difference in practice, because during batch gradient descent the data distribution …

cross-entropy difference divergence entropy kl-divergence loss machinelearning

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne