Bandits with Anytime Knapsacks

Eray Can Elumar; Cem Tekin; Osman Yagan

Bandits with Anytime Knapsacks

Eray Can Elumar, Cem Tekin, Osman Yagan

TL;DR

This work proposes SUAK, an algorithm that utilizes upper confidence bounds to identify the optimal mixture of arms while maintaining a balance between exploration and exploitation and shows that SUAK attains the same problem-dependent regret upper bound of $ O(K \log T)$ established in prior work under the simpler BwK framework.

Abstract

We consider bandits with anytime knapsacks (BwAK), a novel version of the BwK problem where there is an \textit{anytime} cost constraint instead of a total cost budget. This problem setting introduces additional complexities as it mandates adherence to the constraint throughout the decision-making process. We propose SUAK, an algorithm that utilizes upper confidence bounds to identify the optimal mixture of arms while maintaining a balance between exploration and exploitation. SUAK is an adaptive algorithm that strategically utilizes the available budget in each round in the decision-making process and skips a round when it is possible to violate the anytime cost constraint. In particular, SUAK slightly under-utilizes the available cost budget to reduce the need for skipping rounds. We show that SUAK attains the same problem-dependent regret upper bound of $ O(K \log T)$ established in prior work under the simpler BwK framework. Finally, we provide simulations to verify the utility of SUAK in practical settings.

Bandits with Anytime Knapsacks

TL;DR

Abstract

Bandits with Anytime Knapsacks

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)

Theorems & Definitions (9)