Table of Contents
Fetching ...

Improved Regret Bounds for Bandits with Expert Advice

Nicolò Cesa-Bianchi, Khaled Eldowa, Emmanuel Esposito, Julia Olkhovskaya

TL;DR

This paper advances the theory of bandits with expert advice by establishing a near-matching lower bound under restricted feedback, showing $\sqrt{KT\ln(N/K)}$ as the minimax rate for $N>K$ and thus proving optimality of the restricted-feedback bound. It introduces a $q$-FTRL algorithm based on the negative $q$-Tsallis entropy to obtain a tight worst-case bound that scales with $K$ and $N$, and then enhances this with a doubling trick to achieve an instance-dependent regret bound that depends on the chi-squared capacity of expert recommendations. An accompanying lower-bound result for the restricted-advice model is derived via a reduction from feedback-graph bandits, demonstrating that the instance-dependent bound is essentially tight in the worst case. The work also connects to existing literature on PolyINF/EXP4 and offers a nuanced view of how expert agreement impacts regret through capacity-based analyses.

Abstract

In this research note, we revisit the bandits with expert advice problem. Under a restricted feedback model, we prove a lower bound of order $\sqrt{K T \ln(N/K)}$ for the worst-case regret, where $K$ is the number of actions, $N>K$ the number of experts, and $T$ the time horizon. This matches a previously known upper bound of the same order and improves upon the best available lower bound of $\sqrt{K T (\ln N) / (\ln K)}$. For the standard feedback model, we prove a new instance-based upper bound that depends on the agreement between the experts and provides a logarithmic improvement compared to prior results.

Improved Regret Bounds for Bandits with Expert Advice

TL;DR

This paper advances the theory of bandits with expert advice by establishing a near-matching lower bound under restricted feedback, showing as the minimax rate for and thus proving optimality of the restricted-feedback bound. It introduces a -FTRL algorithm based on the negative -Tsallis entropy to obtain a tight worst-case bound that scales with and , and then enhances this with a doubling trick to achieve an instance-dependent regret bound that depends on the chi-squared capacity of expert recommendations. An accompanying lower-bound result for the restricted-advice model is derived via a reduction from feedback-graph bandits, demonstrating that the instance-dependent bound is essentially tight in the worst case. The work also connects to existing literature on PolyINF/EXP4 and offers a nuanced view of how expert agreement impacts regret through capacity-based analyses.

Abstract

In this research note, we revisit the bandits with expert advice problem. Under a restricted feedback model, we prove a lower bound of order for the worst-case regret, where is the number of actions, the number of experts, and the time horizon. This matches a previously known upper bound of the same order and improves upon the best available lower bound of . For the standard feedback model, we prove a new instance-based upper bound that depends on the agreement between the experts and provides a logarithmic improvement compared to prior results.

Paper Structure

This paper contains 10 sections, 4 theorems, 41 equations, 2 algorithms.

Key Result

Theorem 3.1

alg:qFTRL run with satisfies

Theorems & Definitions (8)

  • Theorem 3.1
  • proof
  • Theorem 4.1
  • proof
  • Theorem 5.1
  • proof
  • Lemma A.1
  • proof