Improved Regret Bounds for Bandits with Expert Advice
Nicolò Cesa-Bianchi, Khaled Eldowa, Emmanuel Esposito, Julia Olkhovskaya
TL;DR
This paper advances the theory of bandits with expert advice by establishing a near-matching lower bound under restricted feedback, showing $\sqrt{KT\ln(N/K)}$ as the minimax rate for $N>K$ and thus proving optimality of the restricted-feedback bound. It introduces a $q$-FTRL algorithm based on the negative $q$-Tsallis entropy to obtain a tight worst-case bound that scales with $K$ and $N$, and then enhances this with a doubling trick to achieve an instance-dependent regret bound that depends on the chi-squared capacity of expert recommendations. An accompanying lower-bound result for the restricted-advice model is derived via a reduction from feedback-graph bandits, demonstrating that the instance-dependent bound is essentially tight in the worst case. The work also connects to existing literature on PolyINF/EXP4 and offers a nuanced view of how expert agreement impacts regret through capacity-based analyses.
Abstract
In this research note, we revisit the bandits with expert advice problem. Under a restricted feedback model, we prove a lower bound of order $\sqrt{K T \ln(N/K)}$ for the worst-case regret, where $K$ is the number of actions, $N>K$ the number of experts, and $T$ the time horizon. This matches a previously known upper bound of the same order and improves upon the best available lower bound of $\sqrt{K T (\ln N) / (\ln K)}$. For the standard feedback model, we prove a new instance-based upper bound that depends on the agreement between the experts and provides a logarithmic improvement compared to prior results.
