Learning with Multiple Correct Answers -- A Trichotomy of Regret Bounds under Different Feedback Models
Alireza F. Pour, Farnam Mansouri, Shai Ben-David
TL;DR
This work studies online learning where each input can have multiple valid outputs, formalizing three feedback models: Mistake-Unknown, Mistake-Known, and Set-Valued. It introduces three Littlestone-type dimensions $\mathsf{LD}_{\text{unknown}}$, $\mathsf{LD}_{\text{known}}$, and $\mathsf{LD}_{\text{set}}$ to capture realizable learnability, revealing a trichotomy of agnostic regret: linear in $T$ for Mistake-Unknown, sublinear $O\big(\sqrt{T\log T\,\mathsf{LD}_{\text{known}}(\mathcal{H})\,|\mathcal{Y}|}\big)$ for Mistake-Known, and constant or $O\big(\sqrt{T\log T\,\mathsf{LD}_{\text{set}}(\mathcal{H})}\big)$ for Set-Valued. The paper provides realizable bounds matching $\min(\mathsf{LD}_*(\mathcal{H}),T)$ for each model, demonstrates linear lower bounds in the Mistake-Unknown case, and develops EXP4-based reductions and weighted-majority techniques to achieve sublinear and even constant regret in the agnostic and set-valued scenarios. It also connects online guarantees to batch learnability via online-to-batch reductions, yielding sample complexities governed by the corresponding dimensions, applicable to infinite hypothesis classes. Overall, the work establishes a sharp trichotomy across feedback models and clarifies when efficient learning is possible under multiple acceptable outputs.
Abstract
We study an online learning problem with multiple correct answers, where each instance admits a set of valid labels, and in each round the learner must output a valid label for the queried example. This setting is motivated by language generation tasks, in which a prompt may admit many acceptable completions, but not every completion is acceptable. We study this problem under three feedback models. For each model, we characterize the optimal mistake bound in the realizable setting using an appropriate combinatorial dimension. We then establish a trichotomy of regret bounds across the three models in the agnostic setting. Our results also imply sample complexity bounds for the batch setup that depend on the respective combinatorial dimensions.
