Improved Regret Bounds for Bandits with Expert Advice

Nicolò Cesa-Bianchi; Khaled Eldowa; Emmanuel Esposito; Julia Olkhovskaya

Improved Regret Bounds for Bandits with Expert Advice

Nicolò Cesa-Bianchi, Khaled Eldowa, Emmanuel Esposito, Julia Olkhovskaya

TL;DR

This paper advances the theory of bandits with expert advice by establishing a near-matching lower bound under restricted feedback, showing $\sqrt{KT\ln(N/K)}$ as the minimax rate for $N>K$ and thus proving optimality of the restricted-feedback bound. It introduces a $q$-FTRL algorithm based on the negative $q$-Tsallis entropy to obtain a tight worst-case bound that scales with $K$ and $N$, and then enhances this with a doubling trick to achieve an instance-dependent regret bound that depends on the chi-squared capacity of expert recommendations. An accompanying lower-bound result for the restricted-advice model is derived via a reduction from feedback-graph bandits, demonstrating that the instance-dependent bound is essentially tight in the worst case. The work also connects to existing literature on PolyINF/EXP4 and offers a nuanced view of how expert agreement impacts regret through capacity-based analyses.

Abstract

In this research note, we revisit the bandits with expert advice problem. Under a restricted feedback model, we prove a lower bound of order $\sqrt{K T \ln(N/K)}$ for the worst-case regret, where $K$ is the number of actions, $N>K$ the number of experts, and $T$ the time horizon. This matches a previously known upper bound of the same order and improves upon the best available lower bound of $\sqrt{K T (\ln N) / (\ln K)}$. For the standard feedback model, we prove a new instance-based upper bound that depends on the agreement between the experts and provides a logarithmic improvement compared to prior results.

Improved Regret Bounds for Bandits with Expert Advice

TL;DR

This paper advances the theory of bandits with expert advice by establishing a near-matching lower bound under restricted feedback, showing

as the minimax rate for

and thus proving optimality of the restricted-feedback bound. It introduces a

-FTRL algorithm based on the negative

-Tsallis entropy to obtain a tight worst-case bound that scales with

and

, and then enhances this with a doubling trick to achieve an instance-dependent regret bound that depends on the chi-squared capacity of expert recommendations. An accompanying lower-bound result for the restricted-advice model is derived via a reduction from feedback-graph bandits, demonstrating that the instance-dependent bound is essentially tight in the worst case. The work also connects to existing literature on PolyINF/EXP4 and offers a nuanced view of how expert agreement impacts regret through capacity-based analyses.

Abstract

In this research note, we revisit the bandits with expert advice problem. Under a restricted feedback model, we prove a lower bound of order

for the worst-case regret, where

is the number of actions,

the number of experts, and

the time horizon. This matches a previously known upper bound of the same order and improves upon the best available lower bound of

. For the standard feedback model, we prove a new instance-based upper bound that depends on the agreement between the experts and provides a logarithmic improvement compared to prior results.

Improved Regret Bounds for Bandits with Expert Advice

TL;DR

Abstract

Improved Regret Bounds for Bandits with Expert Advice

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (8)