Feb. 9, 2024, 5:43 a.m. | Seongmin Lee Marcel B\"ohme

cs.LG updates on arXiv.org arxiv.org

It might seem counter-intuitive at first: We find that, in expectation, the proportion of data points in an unknown population-that belong to classes that do not appear in the training data-is almost entirely determined by the number $f_k$ of classes that do appear in the training data the same number of times. While in theory we show that the difference of the induced estimator decays exponentially in the size of the sample, in practice the high variance prevents us from …

cs.lg cs.ne data information population stat.ml training training data

