Online Learning with Sublinear Best-Action Queries

Matteo Russo; Andrea Celli; Riccardo Colini Baldeschi; Federico Fusco; Daniel Haimovich; Dima Karamshuk; Stefano Leonardi; Niek Tax

Online Learning with Sublinear Best-Action Queries

Matteo Russo, Andrea Celli, Riccardo Colini Baldeschi, Federico Fusco, Daniel Haimovich, Dima Karamshuk, Stefano Leonardi, Niek Tax

TL;DR

This work studies online learning with a budget of at most $k$ best-action queries, where an oracle reveals the identity of the best action at a time step. It shows that in the full-feedback model, a Hedge-based approach with uniformly random queries achieves a minimax regret of $Θ(min{√T, T/k})$, and provides matching lower bounds, revealing a multiplicative gain from querying. In the label-efficient setting, updating only at queried times yields $Θ(min{T/√k, T^2/k^2})$ regret with tight lower bounds; in the stochastic i.i.d. setting, Follow-The-Leader and Explore-Then-Commit variants achieve $Θ(T/k)$ or $tilde{Θ}(√T)$ depending on $k$, with improvements over standard label-efficient rates. The results collectively show that even sublinear querying budgets can substantially improve regret, and they establish precise minimax rates across full and partial feedback, as well as in the stochastic regime, with potential practical impact for budgets-constrained prediction tasks.

Abstract

In online learning, a decision maker repeatedly selects one of a set of actions, with the goal of minimizing the overall loss incurred. Following the recent line of research on algorithms endowed with additional predictive features, we revisit this problem by allowing the decision maker to acquire additional information on the actions to be selected. In particular, we study the power of \emph{best-action queries}, which reveal beforehand the identity of the best action at a given time step. In practice, predictive features may be expensive, so we allow the decision maker to issue at most $k$ such queries. We establish tight bounds on the performance any algorithm can achieve when given access to $k$ best-action queries for different types of feedback models. In particular, we prove that in the full feedback model, $k$ queries are enough to achieve an optimal regret of $Θ\left(\min\left\{\sqrt T, \frac Tk\right\}\right)$. This finding highlights the significant multiplicative advantage in the regret rate achievable with even a modest (sublinear) number $k \in Ω(\sqrt{T})$ of queries. Additionally, we study the challenging setting in which the only available feedback is obtained during the time steps corresponding to the $k$ best-action queries. There, we provide a tight regret rate of $Θ\left(\min\left\{\frac{T}{\sqrt k},\frac{T^2}{k^2}\right\}\right)$, which improves over the standard $Θ\left(\frac{T}{\sqrt k}\right)$ regret rate for label efficient prediction for $k \in Ω(T^{2/3})$.

Online Learning with Sublinear Best-Action Queries

TL;DR

This work studies online learning with a budget of at most

best-action queries, where an oracle reveals the identity of the best action at a time step. It shows that in the full-feedback model, a Hedge-based approach with uniformly random queries achieves a minimax regret of

, and provides matching lower bounds, revealing a multiplicative gain from querying. In the label-efficient setting, updating only at queried times yields

regret with tight lower bounds; in the stochastic i.i.d. setting, Follow-The-Leader and Explore-Then-Commit variants achieve

depending on

, with improvements over standard label-efficient rates. The results collectively show that even sublinear querying budgets can substantially improve regret, and they establish precise minimax rates across full and partial feedback, as well as in the stochastic regime, with potential practical impact for budgets-constrained prediction tasks.

Abstract

such queries. We establish tight bounds on the performance any algorithm can achieve when given access to

best-action queries for different types of feedback models. In particular, we prove that in the full feedback model,

queries are enough to achieve an optimal regret of

. This finding highlights the significant multiplicative advantage in the regret rate achievable with even a modest (sublinear) number

of queries. Additionally, we study the challenging setting in which the only available feedback is obtained during the time steps corresponding to the

best-action queries. There, we provide a tight regret rate of

, which improves over the standard

regret rate for label efficient prediction for

Paper Structure (25 sections, 14 theorems, 79 equations, 3 algorithms)

This paper contains 25 sections, 14 theorems, 79 equations, 3 algorithms.

Introduction
Our Model
Our Results
Full feedback.
Label efficient feedback.
Stochastic i.i.d. setting.
Technical Challenges
Related Work
Correlated hints.
Queries/Ordinal hints.
Algorithms with predictions.
Hedge with $k$ Best-Action Queries
Full Feedback: An $O(\frac{T\log n}{k})$ Regret Bound
Label Efficient Feedback: An $O(\frac{T^2\log n}{k^2})$ Regret Bound
Algorithm description.
...and 10 more sections

Key Result

Lemma 3.0

Consider the Hedge algorithm $\textmd{\textup{Hedge}}_\eta(\tilde{\boldsymbol{\ell}})$ run on loss sequence $\tilde{\boldsymbol{\ell}} \in [0,U]^{n \times T}$ with learning rate $\eta< 1/U$. Then, for all action $i \in [n]$, it holds that, where $\tilde{L}_T(\textmd{\textup{Hedge}}_\eta(\tilde{\boldsymbol{\ell}})) = \sum_{t \in [T], i \in [n]} p_t(i) \cdot \tilde{\ell}_t(i)$ is the expected cumul

Theorems & Definitions (29)

Lemma 3.0
proof
Theorem 3.1
proof : Proof of Theorem \ref{['thm:adv2']}
Theorem 3.3
Lemma 3.4
proof
proof : Proof of Theorem \ref{['thm:adv3']}
Lemma 4.1
proof
...and 19 more

Online Learning with Sublinear Best-Action Queries

TL;DR

Abstract

Online Learning with Sublinear Best-Action Queries

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (29)