On the Minimax Regret in Online Ranking with Top-k Feedback
Mingyuan Zhang, Ambuj Tewari
TL;DR
This work addresses minimax regret in online ranking with top-$k$ partial feedback for PL, DCG, and P@n in a non-contextual setting. It leverages finite partial monitoring theory to classify observability properties, yielding complete minimax-rate characterizations for all $k$ and all three measures; notably, P@n achieves $\Theta(T^{1/2})$ regret for all $k$, with an efficient algorithm to realize it. For PL and DCG (and SL), the results establish a transition from $\Theta(T^{2/3})$ to $\Theta(T^{1/2})$ as $k$ grows to $m-1$, clarifying the impact of feedback richness. Practically, the proposed NeighborhoodWatch2 variant provides a polynomial-time method to attain the minimax rate for P@n, making these insights actionable for large-scale ranking with partial supervision.
Abstract
In online ranking, a learning algorithm sequentially ranks a set of items and receives feedback on its ranking in the form of relevance scores. Since obtaining relevance scores typically involves human annotation, it is of great interest to consider a partial feedback setting where feedback is restricted to the top-$k$ items in the rankings. Chaudhuri and Tewari [2017] developed a framework to analyze online ranking algorithms with top $k$ feedback. A key element in their work was the use of techniques from partial monitoring. In this paper, we further investigate online ranking with top $k$ feedback and solve some open problems posed by Chaudhuri and Tewari [2017]. We provide a full characterization of minimax regret rates with the top $k$ feedback model for all $k$ and for the following ranking performance measures: Pairwise Loss, Discounted Cumulative Gain, and Precision@n. In addition, we give an efficient algorithm that achieves the minimax regret rate for Precision@n.
