Ambiguous Online Learning
Vanessa Kosoy
TL;DR
Ambiguous Online Learning extends classical online learning to multivalued hypotheses and predictions, where a prediction set $\alpha$ is correct when $y\in\alpha\subseteq h^*(x)$. The authors develop the Ambiguous Littlestone framework, introducing invariants $\mathrm{AL}(\mathcal{H})$, $\underline{\mathrm{PD}}$, $\overline{\mathrm{PD}}$, and $L_{\mathfrak{P}}(\mathcal{H})$ to characterize minimax mistake bounds. They prove a trichotomy: $\mathcal{M}^*_{\mathcal{H}}(N)$ is in $\Theta(1)$, $\tilde{\Theta}(\sqrt{N})$, or $\Theta(N)$, with the $O(1)$ case determined by $\mathrm{AL}(\mathcal{H})<\infty$, and the $\tilde{\Theta}(\sqrt{N})$ case governed by $\underline{\mathrm{PD}}(\mathcal{H})$ and $L_{\mathfrak{P}}(\mathcal{H})$. The paper introduces the Ambiguous Optimal Algorithm (AOA) achieving the $O(1)$ bound, and the Weighted Aggregation Algorithm (WAA) achieving the $\tilde{O}(\sqrt{N})$ bound for finite classes (with extensions to infinite ones via reductions). It also connects AOL to apple tasting and to partial-function online learning, establishing lower bounds and tightness results. These results yield a principled understanding of how ambiguity and multi-valued predictions impact minimax regret and suggest practical algorithms for multivalued prediction tasks in domains like dynamic systems and structured prediction.
Abstract
We propose a new variant of online learning that we call "ambiguous online learning". In this setting, the learner is allowed to produce multiple predicted labels. Such an "ambiguous prediction" is considered correct when at least one of the labels is correct, and none of the labels are "predictably wrong". The definition of "predictably wrong" comes from a hypothesis class in which hypotheses are also multi-valued. Thus, a prediction is "predictably wrong" if it's not allowed by the (unknown) true hypothesis. In particular, this setting is natural in the context of multivalued dynamical systems, recommendation algorithms and lossless compression. It is also strongly related to so-called "apple tasting". We show that in this setting, there is a trichotomy of mistake bounds: up to logarithmic factors, any hypothesis class has an optimal mistake bound of either Theta(1), Theta(sqrt(N)) or N.
