Table of Contents
Fetching ...

Learning with Multiple Correct Answers -- A Trichotomy of Regret Bounds under Different Feedback Models

Alireza F. Pour, Farnam Mansouri, Shai Ben-David

TL;DR

This work studies online learning where each input can have multiple valid outputs, formalizing three feedback models: Mistake-Unknown, Mistake-Known, and Set-Valued. It introduces three Littlestone-type dimensions $\mathsf{LD}_{\text{unknown}}$, $\mathsf{LD}_{\text{known}}$, and $\mathsf{LD}_{\text{set}}$ to capture realizable learnability, revealing a trichotomy of agnostic regret: linear in $T$ for Mistake-Unknown, sublinear $O\big(\sqrt{T\log T\,\mathsf{LD}_{\text{known}}(\mathcal{H})\,|\mathcal{Y}|}\big)$ for Mistake-Known, and constant or $O\big(\sqrt{T\log T\,\mathsf{LD}_{\text{set}}(\mathcal{H})}\big)$ for Set-Valued. The paper provides realizable bounds matching $\min(\mathsf{LD}_*(\mathcal{H}),T)$ for each model, demonstrates linear lower bounds in the Mistake-Unknown case, and develops EXP4-based reductions and weighted-majority techniques to achieve sublinear and even constant regret in the agnostic and set-valued scenarios. It also connects online guarantees to batch learnability via online-to-batch reductions, yielding sample complexities governed by the corresponding dimensions, applicable to infinite hypothesis classes. Overall, the work establishes a sharp trichotomy across feedback models and clarifies when efficient learning is possible under multiple acceptable outputs.

Abstract

We study an online learning problem with multiple correct answers, where each instance admits a set of valid labels, and in each round the learner must output a valid label for the queried example. This setting is motivated by language generation tasks, in which a prompt may admit many acceptable completions, but not every completion is acceptable. We study this problem under three feedback models. For each model, we characterize the optimal mistake bound in the realizable setting using an appropriate combinatorial dimension. We then establish a trichotomy of regret bounds across the three models in the agnostic setting. Our results also imply sample complexity bounds for the batch setup that depend on the respective combinatorial dimensions.

Learning with Multiple Correct Answers -- A Trichotomy of Regret Bounds under Different Feedback Models

TL;DR

This work studies online learning where each input can have multiple valid outputs, formalizing three feedback models: Mistake-Unknown, Mistake-Known, and Set-Valued. It introduces three Littlestone-type dimensions , , and to capture realizable learnability, revealing a trichotomy of agnostic regret: linear in for Mistake-Unknown, sublinear for Mistake-Known, and constant or for Set-Valued. The paper provides realizable bounds matching for each model, demonstrates linear lower bounds in the Mistake-Unknown case, and develops EXP4-based reductions and weighted-majority techniques to achieve sublinear and even constant regret in the agnostic and set-valued scenarios. It also connects online guarantees to batch learnability via online-to-batch reductions, yielding sample complexities governed by the corresponding dimensions, applicable to infinite hypothesis classes. Overall, the work establishes a sharp trichotomy across feedback models and clarifies when efficient learning is possible under multiple acceptable outputs.

Abstract

We study an online learning problem with multiple correct answers, where each instance admits a set of valid labels, and in each round the learner must output a valid label for the queried example. This setting is motivated by language generation tasks, in which a prompt may admit many acceptable completions, but not every completion is acceptable. We study this problem under three feedback models. For each model, we characterize the optimal mistake bound in the realizable setting using an appropriate combinatorial dimension. We then establish a trichotomy of regret bounds across the three models in the agnostic setting. Our results also imply sample complexity bounds for the batch setup that depend on the respective combinatorial dimensions.
Paper Structure (17 sections, 17 theorems, 58 equations, 4 algorithms)

This paper contains 17 sections, 17 theorems, 58 equations, 4 algorithms.

Key Result

Lemma 3.5

If $\mathsf{LDunknown}(\mathcal{H}) \geq d$, then there exists a multi-label tree $\mathcal{T}$ d-shattered by $\mathcal{H}$, such that for every path $\sigma = (\hat{y}_1, ..., \hat{y}_T)$ from root to leaf there exists a $h_\sigma$ that in addition to satisfying properties (i) and (ii) also satisf

Theorems & Definitions (38)

  • Definition 2.1: Unknown Mistake Bound
  • Definition 2.2: Known Mistake Bound
  • Definition 2.3: Set-Valued Mistake Bound
  • Definition 2.4: Mistake-unknown Regret
  • Definition 2.5: Mistake-known Regret
  • Definition 2.6: Set-valued Regret
  • Example 1
  • Definition 3.1: Set-Valued Littlestone Dimension
  • Definition 3.2: Mistake Known Multi-Label Littlestone Dimension
  • Definition 3.3: Mistake Unknown Multi-Label Littlestone Dimension
  • ...and 28 more