Learning with Multiple Correct Answers -- A Trichotomy of Regret Bounds under Different Feedback Models

Alireza F. Pour; Farnam Mansouri; Shai Ben-David

Learning with Multiple Correct Answers -- A Trichotomy of Regret Bounds under Different Feedback Models

Alireza F. Pour, Farnam Mansouri, Shai Ben-David

TL;DR

This work studies online learning where each input can have multiple valid outputs, formalizing three feedback models: Mistake-Unknown, Mistake-Known, and Set-Valued. It introduces three Littlestone-type dimensions $\mathsf{LD}_{\text{unknown}}$, $\mathsf{LD}_{\text{known}}$, and $\mathsf{LD}_{\text{set}}$ to capture realizable learnability, revealing a trichotomy of agnostic regret: linear in $T$ for Mistake-Unknown, sublinear $O\big(\sqrt{T\log T\,\mathsf{LD}_{\text{known}}(\mathcal{H})\,|\mathcal{Y}|}\big)$ for Mistake-Known, and constant or $O\big(\sqrt{T\log T\,\mathsf{LD}_{\text{set}}(\mathcal{H})}\big)$ for Set-Valued. The paper provides realizable bounds matching $\min(\mathsf{LD}_*(\mathcal{H}),T)$ for each model, demonstrates linear lower bounds in the Mistake-Unknown case, and develops EXP4-based reductions and weighted-majority techniques to achieve sublinear and even constant regret in the agnostic and set-valued scenarios. It also connects online guarantees to batch learnability via online-to-batch reductions, yielding sample complexities governed by the corresponding dimensions, applicable to infinite hypothesis classes. Overall, the work establishes a sharp trichotomy across feedback models and clarifies when efficient learning is possible under multiple acceptable outputs.

Abstract

We study an online learning problem with multiple correct answers, where each instance admits a set of valid labels, and in each round the learner must output a valid label for the queried example. This setting is motivated by language generation tasks, in which a prompt may admit many acceptable completions, but not every completion is acceptable. We study this problem under three feedback models. For each model, we characterize the optimal mistake bound in the realizable setting using an appropriate combinatorial dimension. We then establish a trichotomy of regret bounds across the three models in the agnostic setting. Our results also imply sample complexity bounds for the batch setup that depend on the respective combinatorial dimensions.

Learning with Multiple Correct Answers -- A Trichotomy of Regret Bounds under Different Feedback Models

TL;DR

, and

to capture realizable learnability, revealing a trichotomy of agnostic regret: linear in

for Mistake-Unknown, sublinear

for Mistake-Known, and constant or

for Set-Valued. The paper provides realizable bounds matching

for each model, demonstrates linear lower bounds in the Mistake-Unknown case, and develops EXP4-based reductions and weighted-majority techniques to achieve sublinear and even constant regret in the agnostic and set-valued scenarios. It also connects online guarantees to batch learnability via online-to-batch reductions, yielding sample complexities governed by the corresponding dimensions, applicable to infinite hypothesis classes. Overall, the work establishes a sharp trichotomy across feedback models and clarifies when efficient learning is possible under multiple acceptable outputs.

Abstract

Paper Structure (17 sections, 17 theorems, 58 equations, 4 algorithms)

This paper contains 17 sections, 17 theorems, 58 equations, 4 algorithms.

Introduction
Related Works
Our Contributions
Setup
Notations:
A Comparison Between Notions of Regret
Combinatorial Parameters
A Characterization of Realizable Online Multi-Label Learning
Agnostic Online Multi-label Learning
A Gentle Start: An Example Hypothesis Class
Agnostic Online Learning Under Mistake-Known Feedback Model
Agnostic Online Learning Under Set-valued Feedback Model
A Batch Study of Learning from Multiple Correct Answers
Proof of Theorem \ref{['thm:real-char']}: Realizable Characterizations
Proof of Theorem \ref{['thm:set-valued-const']}: Constant Set-Valued Regret
...and 2 more sections

Key Result

Lemma 3.5

If $\mathsf{LDunknown}(\mathcal{H}) \geq d$, then there exists a multi-label tree $\mathcal{T}$ d-shattered by $\mathcal{H}$, such that for every path $\sigma = (\hat{y}_1, ..., \hat{y}_T)$ from root to leaf there exists a $h_\sigma$ that in addition to satisfying properties (i) and (ii) also satisf

Theorems & Definitions (38)

Definition 2.1: Unknown Mistake Bound
Definition 2.2: Known Mistake Bound
Definition 2.3: Set-Valued Mistake Bound
Definition 2.4: Mistake-unknown Regret
Definition 2.5: Mistake-known Regret
Definition 2.6: Set-valued Regret
Example 1
Definition 3.1: Set-Valued Littlestone Dimension
Definition 3.2: Mistake Known Multi-Label Littlestone Dimension
Definition 3.3: Mistake Unknown Multi-Label Littlestone Dimension
...and 28 more

Learning with Multiple Correct Answers -- A Trichotomy of Regret Bounds under Different Feedback Models

TL;DR

Abstract

Learning with Multiple Correct Answers -- A Trichotomy of Regret Bounds under Different Feedback Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (38)