Table of Contents
Fetching ...

Beyond Primal-Dual Methods in Bandits with Stochastic and Adversarial Constraints

Martino Bernasconi, Matteo Castiglioni, Andrea Celli, Federico Fusco

TL;DR

This paper tackles maximizing cumulative reward over a horizon $T$ under an arbitrary set of long-term constraints in bandits with adversarial rewards and possibly stochastic costs. It introduces a simple, optimistic constraint-estimation framework that builds a moving feasible set via an upper-confidence-type bonus and combines it with a moving-set regret minimizer. The resulting algorithm achieves best-of-both-worlds guarantees with a logarithmic dependence on the number of constraints $m$, including a $ ilde{O}(\, obreak sqrt{T})$ regret bound in the stochastic setting without Slater’s condition and a constant competitive ratio in the adversarial setting that depends on the Slater parameter $\rho$. The approach also provides anytime constraint-violation bounds and shows convergence of the policy toward feasibility in expectation, all with a simpler analysis than primal-dual approaches and without requiring strong adaptivity assumptions.

Abstract

We address a generalization of the bandit with knapsacks problem, where a learner aims to maximize rewards while satisfying an arbitrary set of long-term constraints. Our goal is to design best-of-both-worlds algorithms that perform optimally under both stochastic and adversarial constraints. Previous works address this problem via primal-dual methods, and require some stringent assumptions, namely the Slater's condition, and in adversarial settings, they either assume knowledge of a lower bound on the Slater's parameter, or impose strong requirements on the primal and dual regret minimizers such as requiring weak adaptivity. We propose an alternative and more natural approach based on optimistic estimations of the constraints. Surprisingly, we show that estimating the constraints with an UCB-like approach guarantees optimal performances. Our algorithm consists of two main components: (i) a regret minimizer working on \emph{moving strategy sets} and (ii) an estimate of the feasible set as an optimistic weighted empirical mean of previous samples. The key challenge in this approach is designing adaptive weights that meet the different requirements for stochastic and adversarial constraints. Our algorithm is significantly simpler than previous approaches, and has a cleaner analysis. Moreover, ours is the first best-of-both-worlds algorithm providing bounds logarithmic in the number of constraints. Additionally, in stochastic settings, it provides $\widetilde O(\sqrt{T})$ regret \emph{without} Slater's condition.

Beyond Primal-Dual Methods in Bandits with Stochastic and Adversarial Constraints

TL;DR

This paper tackles maximizing cumulative reward over a horizon under an arbitrary set of long-term constraints in bandits with adversarial rewards and possibly stochastic costs. It introduces a simple, optimistic constraint-estimation framework that builds a moving feasible set via an upper-confidence-type bonus and combines it with a moving-set regret minimizer. The resulting algorithm achieves best-of-both-worlds guarantees with a logarithmic dependence on the number of constraints , including a regret bound in the stochastic setting without Slater’s condition and a constant competitive ratio in the adversarial setting that depends on the Slater parameter . The approach also provides anytime constraint-violation bounds and shows convergence of the policy toward feasibility in expectation, all with a simpler analysis than primal-dual approaches and without requiring strong adaptivity assumptions.

Abstract

We address a generalization of the bandit with knapsacks problem, where a learner aims to maximize rewards while satisfying an arbitrary set of long-term constraints. Our goal is to design best-of-both-worlds algorithms that perform optimally under both stochastic and adversarial constraints. Previous works address this problem via primal-dual methods, and require some stringent assumptions, namely the Slater's condition, and in adversarial settings, they either assume knowledge of a lower bound on the Slater's parameter, or impose strong requirements on the primal and dual regret minimizers such as requiring weak adaptivity. We propose an alternative and more natural approach based on optimistic estimations of the constraints. Surprisingly, we show that estimating the constraints with an UCB-like approach guarantees optimal performances. Our algorithm consists of two main components: (i) a regret minimizer working on \emph{moving strategy sets} and (ii) an estimate of the feasible set as an optimistic weighted empirical mean of previous samples. The key challenge in this approach is designing adaptive weights that meet the different requirements for stochastic and adversarial constraints. Our algorithm is significantly simpler than previous approaches, and has a cleaner analysis. Moreover, ours is the first best-of-both-worlds algorithm providing bounds logarithmic in the number of constraints. Additionally, in stochastic settings, it provides regret \emph{without} Slater's condition.
Paper Structure (21 sections, 30 theorems, 95 equations, 2 algorithms)

This paper contains 21 sections, 30 theorems, 95 equations, 2 algorithms.

Key Result

Theorem 4.1

Let $x_t$ be selected accordingly to alg:RM run with arbitrary sequence of convex sets $\widehat{\mathcal{X}}_t\subseteq \Delta_K$ with $\gamma=\tfrac{\beta}{2}$ and $\beta=\sqrt{\frac{\log(K/\delta_1)}{KT}}$. Then, with probability at least $1-\delta_1$ it holds that for any $x\in\bigcap_{t\in\llbr

Theorems & Definitions (46)

  • Theorem 4.1
  • Theorem 5.1
  • Theorem 5.2
  • Lemma 5.2
  • Proposition 5.2
  • Proposition 5.3
  • Theorem 5.4
  • Corollary 5.4
  • Lemma 5.4
  • Theorem 6.1
  • ...and 36 more