Table of Contents
Fetching ...

Bandits with Anytime Knapsacks

Eray Can Elumar, Cem Tekin, Osman Yagan

TL;DR

This work proposes SUAK, an algorithm that utilizes upper confidence bounds to identify the optimal mixture of arms while maintaining a balance between exploration and exploitation and shows that SUAK attains the same problem-dependent regret upper bound of $ O(K \log T)$ established in prior work under the simpler BwK framework.

Abstract

We consider bandits with anytime knapsacks (BwAK), a novel version of the BwK problem where there is an \textit{anytime} cost constraint instead of a total cost budget. This problem setting introduces additional complexities as it mandates adherence to the constraint throughout the decision-making process. We propose SUAK, an algorithm that utilizes upper confidence bounds to identify the optimal mixture of arms while maintaining a balance between exploration and exploitation. SUAK is an adaptive algorithm that strategically utilizes the available budget in each round in the decision-making process and skips a round when it is possible to violate the anytime cost constraint. In particular, SUAK slightly under-utilizes the available cost budget to reduce the need for skipping rounds. We show that SUAK attains the same problem-dependent regret upper bound of $ O(K \log T)$ established in prior work under the simpler BwK framework. Finally, we provide simulations to verify the utility of SUAK in practical settings.

Bandits with Anytime Knapsacks

TL;DR

This work proposes SUAK, an algorithm that utilizes upper confidence bounds to identify the optimal mixture of arms while maintaining a balance between exploration and exploitation and shows that SUAK attains the same problem-dependent regret upper bound of established in prior work under the simpler BwK framework.

Abstract

We consider bandits with anytime knapsacks (BwAK), a novel version of the BwK problem where there is an \textit{anytime} cost constraint instead of a total cost budget. This problem setting introduces additional complexities as it mandates adherence to the constraint throughout the decision-making process. We propose SUAK, an algorithm that utilizes upper confidence bounds to identify the optimal mixture of arms while maintaining a balance between exploration and exploitation. SUAK is an adaptive algorithm that strategically utilizes the available budget in each round in the decision-making process and skips a round when it is possible to violate the anytime cost constraint. In particular, SUAK slightly under-utilizes the available cost budget to reduce the need for skipping rounds. We show that SUAK attains the same problem-dependent regret upper bound of established in prior work under the simpler BwK framework. Finally, we provide simulations to verify the utility of SUAK in practical settings.

Paper Structure

This paper contains 26 sections, 102 equations, 2 figures, 2 tables, 2 algorithms.

Figures (2)

  • Figure 1: The plots of cumulative empirical regret (Left), number of skips (Middle), and average empirical cost regret (Right)
  • Figure 2: The plots of cumulative empirical regret (First two), number of skips (Third), and average empirical cost regret (Fourth)

Theorems & Definitions (9)

  • Definition 3.1
  • Definition 3.2
  • Definition 3.3
  • proof
  • proof
  • proof
  • proof
  • proof
  • proof