Rising Rested Bandits: Lower Bounds and Efficient Algorithms

Marco Fiandri; Alberto Maria Metelli; Francesco Trov`o

Rising Rested Bandits: Lower Bounds and Efficient Algorithms

Marco Fiandri, Alberto Maria Metelli, Francesco Trov`o

TL;DR

An algorithm is designed for the rested case of the rested bandits in which the arms' expected reward is monotonically non-decreasing and concave, providing a regret bound depending on the properties of the instance and, under certain circumstances, of $\widetilde{\mathcal{O}}(T^{\frac{2}{3}})$.

Abstract

This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e. those sequential selection techniques able to learn online using only the feedback given by the chosen option (a.k.a. $arm$). We study a particular case of the rested bandits in which the arms' expected reward is monotonically non-decreasing and concave. We study the inherent sample complexity of the regret minimization problem by deriving suitable regret lower bounds. Then, we design an algorithm for the rested case $\textit{R-ed-UCB}$, providing a regret bound depending on the properties of the instance and, under certain circumstances, of $\widetilde{\mathcal{O}}(T^{\frac{2}{3}})$. We empirically compare our algorithms with state-of-the-art methods for non-stationary MABs over several synthetically generated tasks and an online model selection problem for a real-world dataset

Rising Rested Bandits: Lower Bounds and Efficient Algorithms

TL;DR

Abstract

This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e. those sequential selection techniques able to learn online using only the feedback given by the chosen option (a.k.a.

). We study a particular case of the rested bandits in which the arms' expected reward is monotonically non-decreasing and concave. We study the inherent sample complexity of the regret minimization problem by deriving suitable regret lower bounds. Then, we design an algorithm for the rested case

, providing a regret bound depending on the properties of the instance and, under certain circumstances, of

. We empirically compare our algorithms with state-of-the-art methods for non-stationary MABs over several synthetically generated tasks and an online model selection problem for a real-world dataset

Rising Rested Bandits: Lower Bounds and Efficient Algorithms

TL;DR

Abstract

Rising Rested Bandits: Lower Bounds and Efficient Algorithms

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (49)