On the Minimax Regret in Online Ranking with Top-k Feedback

Mingyuan Zhang; Ambuj Tewari

On the Minimax Regret in Online Ranking with Top-k Feedback

Mingyuan Zhang, Ambuj Tewari

TL;DR

This work addresses minimax regret in online ranking with top-$k$ partial feedback for PL, DCG, and P@n in a non-contextual setting. It leverages finite partial monitoring theory to classify observability properties, yielding complete minimax-rate characterizations for all $k$ and all three measures; notably, P@n achieves $\Theta(T^{1/2})$ regret for all $k$, with an efficient algorithm to realize it. For PL and DCG (and SL), the results establish a transition from $\Theta(T^{2/3})$ to $\Theta(T^{1/2})$ as $k$ grows to $m-1$, clarifying the impact of feedback richness. Practically, the proposed NeighborhoodWatch2 variant provides a polynomial-time method to attain the minimax rate for P@n, making these insights actionable for large-scale ranking with partial supervision.

Abstract

In online ranking, a learning algorithm sequentially ranks a set of items and receives feedback on its ranking in the form of relevance scores. Since obtaining relevance scores typically involves human annotation, it is of great interest to consider a partial feedback setting where feedback is restricted to the top-$k$ items in the rankings. Chaudhuri and Tewari [2017] developed a framework to analyze online ranking algorithms with top $k$ feedback. A key element in their work was the use of techniques from partial monitoring. In this paper, we further investigate online ranking with top $k$ feedback and solve some open problems posed by Chaudhuri and Tewari [2017]. We provide a full characterization of minimax regret rates with the top $k$ feedback model for all $k$ and for the following ranking performance measures: Pairwise Loss, Discounted Cumulative Gain, and Precision@n. In addition, we give an efficient algorithm that achieves the minimax regret rate for Precision@n.

On the Minimax Regret in Online Ranking with Top-k Feedback

TL;DR

This work addresses minimax regret in online ranking with top-

partial feedback for PL, DCG, and P@n in a non-contextual setting. It leverages finite partial monitoring theory to classify observability properties, yielding complete minimax-rate characterizations for all

and all three measures; notably, P@n achieves

regret for all

, with an efficient algorithm to realize it. For PL and DCG (and SL), the results establish a transition from

grows to

, clarifying the impact of feedback richness. Practically, the proposed NeighborhoodWatch2 variant provides a polynomial-time method to attain the minimax rate for P@n, making these insights actionable for large-scale ranking with partial supervision.

Abstract

items in the rankings. Chaudhuri and Tewari [2017] developed a framework to analyze online ranking algorithms with top

feedback. A key element in their work was the use of techniques from partial monitoring. In this paper, we further investigate online ranking with top

feedback and solve some open problems posed by Chaudhuri and Tewari [2017]. We provide a full characterization of minimax regret rates with the top

feedback model for all

and for the following ranking performance measures: Pairwise Loss, Discounted Cumulative Gain, and Precision@n. In addition, we give an efficient algorithm that achieves the minimax regret rate for Precision@n.

Paper Structure (15 sections, 15 theorems, 43 equations, 1 figure, 4 tables, 1 algorithm)

This paper contains 15 sections, 15 theorems, 43 equations, 1 figure, 4 tables, 1 algorithm.

Introduction
Notations and Problem Setup
Ranking Measures
Summary of Results
Finite Partial Monitoring Games
A Quick Review of Finite Partial Monitoring Games
Classification Theorem for Finite Partial Monitoring Games
Minimax Regret Rates for PL, SL and DCG
Minimax Regret Rate for P@n
Efficient Algorithm for Obtaining Minimax Regret Rate for P@n
Conclusion
Proofs for Section \ref{['sec:finite_partial_monitoring']}
Proofs for Section \ref{['sec:pl_sl_dcg']}
Proofs for Section \ref{['sec:p@n']}
Proofs for Section \ref{['sec:p@n_algo']}

Key Result

Lemma 1

The alternative definitions of global observability and local observability (Definition defn:alter_go_lo) are equivalent to the original definitions of global observability and local observability (Definition defn:go and Definition defn:lo), respectively.

Figures (1)

Figure 1: Illustrating proof for Lemma \ref{['lem:local_regret']}, adopted from Lattimore18.

Theorems & Definitions (39)

Remark 1
Definition 1: Optimal action
Definition 2: Cell decomposition
Definition 3: Classification of actions
Definition 4: Neighbors
Definition 5: Signal matrix
Definition 6: Global observability
Definition 7: Local observability
Definition 8: Alternative definitions of global observability and local observability
Lemma 1
...and 29 more

On the Minimax Regret in Online Ranking with Top-k Feedback

TL;DR

Abstract

On the Minimax Regret in Online Ranking with Top-k Feedback

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (39)