Leave-One-Out Learning with Log-Loss
Yaniv Fogel, Meir Feder
TL;DR
This work introduces leave-one-out regret as a natural criterion for universal batch learning with log-loss in the deterministic, individual setting. It rigorously characterizes the first-order minimax regret for three hypothesis class families: multinomial (yielding $R^*_{loo}=\frac{(m-1)}{N}+o(\frac{1}{N})$), deterministic classes with finite VC-dimension (yielding $R^*_{loo}=O\left( \frac{d\log N}{N}\right)$ and matching lower bounds for certain constructions), and general probabilistic classes (also yielding $O\left( \frac{d\log N}{N}\right)$). The results demonstrate that universal batch learning with log-loss is possible in the individual setting, with regret bounds governed by structural properties of the hypothesis class, such as VC-dimension and the one-inclusion graph. The paper also contrasts this approach with existing pNML methods, showing cases where pNML fails while the leave-one-out criterion remains learnable, thereby advancing the understanding of learning from deterministic sequences. Overall, it provides a principled framework and tight first-order bounds for universal, single-sequence learning under log-loss.
Abstract
We study batch learning with log-loss in the individual setting, where the outcome sequence is deterministic. Because empirical statistics are not directly applicable in this regime, obtaining regret guarantees for batch learning has long posed a fundamental challenge. We propose a natural criterion based on leave-one-out regret and analyze its minimax value for several hypothesis classes. For the multinomial simplex over $m$ symbols, we show that the minimax regret is $\frac{m-1}{N} + o\!\left(\frac{1}{N}\right)$, and compare it to the stochastic realizable case where it is $\frac{m-1}{2N} + o\!\left(\frac{1}{N}\right)$. More generally, we prove that every hypothesis class of VC dimension $d$ is learnable in the individual batch-learning problem, with regret at most $\frac{d\log(N)}{N} + o\!\left(\frac{\log(N)}{N}\right)$, and we establish matching lower bounds for certain classes. We further derive additional upper bounds that depend on structural properties of the hypothesis class. These results establish, for the first time, that universal batch learning with log-loss is possible in the individual setting.
