Table of Contents
Fetching ...

Non-stationary Bandit Convex Optimization: A Comprehensive Study

Xiaoqi Liu, Dorian Baudry, Julian Zimmert, Patrick Rebeschini, Arya Akhavan

TL;DR

The paper studies non-stationary Bandit Convex Optimization in continuous action spaces under three non-stationarity measures: the number of switches S, total variation Δ, and path-length P. It introduces TEWA-SE, a polynomial-time, zeroth-order sleeping-experts algorithm with one-point gradient estimates, achieving adaptive interval regret and minimax-optimal dynamic/regret bounds for strongly convex losses with known S and Δ, plus parameter-free extensions via Bandit-over-Bandit. It also presents cExO, a discretized-exponential-weights method with clipping that attains minimax-optimal S and Δ guarantees for general convex losses (though not polynomial-time computable and with higher dimension dependence), and improves path-length regret, with BoB-based variants for unknown non-stationarity. The authors provide matching lower bounds, unify conversions among regret notions, and highlight the remaining open challenge of designing computationally efficient, minimax-optimal algorithms for general convex non-stationary BCO. Overall, the work advances a unified framework for non-stationary BCO, connecting OCO techniques with bandit feedback and setting the stage for future efficient second-order methods.

Abstract

Bandit Convex Optimization is a fundamental class of sequential decision-making problems, where the learner selects actions from a continuous domain and observes a loss (but not its gradient) at only one point per round. We study this problem in non-stationary environments, and aim to minimize the regret under three standard measures of non-stationarity: the number of switches $S$ in the comparator sequence, the total variation $Δ$ of the loss functions, and the path-length $P$ of the comparator sequence. We propose a polynomial-time algorithm, Tilted Exponentially Weighted Average with Sleeping Experts (TEWA-SE), which adapts the sleeping experts framework from online convex optimization to the bandit setting. For strongly convex losses, we prove that TEWA-SE is minimax-optimal with respect to known $S$ and $Δ$ by establishing matching upper and lower bounds. By equipping TEWA-SE with the Bandit-over-Bandit framework, we extend our analysis to environments with unknown non-stationarity measures. For general convex losses, we introduce a second algorithm, clipped Exploration by Optimization (cExO), based on exponential weights over a discretized action space. While not polynomial-time computable, this method achieves minimax-optimal regret with respect to known $S$ and $Δ$, and improves on the best existing bounds with respect to $P$.

Non-stationary Bandit Convex Optimization: A Comprehensive Study

TL;DR

The paper studies non-stationary Bandit Convex Optimization in continuous action spaces under three non-stationarity measures: the number of switches S, total variation Δ, and path-length P. It introduces TEWA-SE, a polynomial-time, zeroth-order sleeping-experts algorithm with one-point gradient estimates, achieving adaptive interval regret and minimax-optimal dynamic/regret bounds for strongly convex losses with known S and Δ, plus parameter-free extensions via Bandit-over-Bandit. It also presents cExO, a discretized-exponential-weights method with clipping that attains minimax-optimal S and Δ guarantees for general convex losses (though not polynomial-time computable and with higher dimension dependence), and improves path-length regret, with BoB-based variants for unknown non-stationarity. The authors provide matching lower bounds, unify conversions among regret notions, and highlight the remaining open challenge of designing computationally efficient, minimax-optimal algorithms for general convex non-stationary BCO. Overall, the work advances a unified framework for non-stationary BCO, connecting OCO techniques with bandit feedback and setting the stage for future efficient second-order methods.

Abstract

Bandit Convex Optimization is a fundamental class of sequential decision-making problems, where the learner selects actions from a continuous domain and observes a loss (but not its gradient) at only one point per round. We study this problem in non-stationary environments, and aim to minimize the regret under three standard measures of non-stationarity: the number of switches in the comparator sequence, the total variation of the loss functions, and the path-length of the comparator sequence. We propose a polynomial-time algorithm, Tilted Exponentially Weighted Average with Sleeping Experts (TEWA-SE), which adapts the sleeping experts framework from online convex optimization to the bandit setting. For strongly convex losses, we prove that TEWA-SE is minimax-optimal with respect to known and by establishing matching upper and lower bounds. By equipping TEWA-SE with the Bandit-over-Bandit framework, we extend our analysis to environments with unknown non-stationarity measures. For general convex losses, we introduce a second algorithm, clipped Exploration by Optimization (cExO), based on exponential weights over a discretized action space. While not polynomial-time computable, this method achieves minimax-optimal regret with respect to known and , and improves on the best existing bounds with respect to .

Paper Structure

This paper contains 26 sections, 23 theorems, 131 equations, 1 figure, 1 table, 4 algorithms.

Key Result

Proposition 1

Suppose that an algorithm can be calibrated to satisfy $R^\textsf{ada}(\mathsf{B}, T)\le C \mathsf{B}^\kappa$, for any interval length $\mathsf{B} \in [T]$, for some factor $C>0$ that is at most polynomial in $d$ and $\log(T)$, and $\kappa\in [0, 1)$. Then, for any $S,S_\Delta, S_P \in [T]$, an appr

Figures (1)

  • Figure 1: Conversions between regrets: $R_1$$\mathrel{}$$R_2$ means that if regret $R_1$ is sublinear in $T$ (or $\mathsf{B}$), then regret $R_2$ is also sublinear in $T$, see Proposition \ref{['prop:conversions']} for precise mathematical statements.

Theorems & Definitions (45)

  • Proposition 1
  • Theorem 1
  • Corollary 1
  • Theorem 2
  • Theorem 3
  • Corollary 2
  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • ...and 35 more