Table of Contents
Fetching ...

Directional Optimism for Safe Linear Bandits

Spencer Hutchinson, Berkay Turan, Mahnoosh Alizadeh

TL;DR

This work addresses safe linear bandits with an unknown linear constraint $a^\top x \le b$ that must hold at every round. It introduces directional optimism (ROFUL), which selects directions optimistically and scales down to remain safe, achieving $\tilde{O}(d\sqrt{T})$ regret, with problem-dependent refinements for well-separated instances. For finite-star-convex action sets, it proposes Safe-PE, an elimination-based method achieving $\tilde{O}(\sqrt{dT})$ regret with reduced dependence on dimension through logarithmic factors in the number of directions. The paper also extends the framework to linked convex constraints $A x_t \in \mathcal{G}$ using convex-analysis tools, and provides numerical experiments validating the theoretical gains and comparing with prior approaches. Overall, directional optimism yields tighter geometry-aware regret and broadens applicability to more complex constraint structures in safe learning settings.

Abstract

The safe linear bandit problem is a version of the classical stochastic linear bandit problem where the learner's actions must satisfy an uncertain constraint at all rounds. Due its applicability to many real-world settings, this problem has received considerable attention in recent years. By leveraging a novel approach that we call directional optimism, we find that it is possible to achieve improved regret guarantees for both well-separated problem instances and action sets that are finite star convex sets. Furthermore, we propose a novel algorithm for this setting that improves on existing algorithms in terms of empirical performance, while enjoying matching regret guarantees. Lastly, we introduce a generalization of the safe linear bandit setting where the constraints are convex and adapt our algorithms and analyses to this setting by leveraging a novel convex-analysis based approach.

Directional Optimism for Safe Linear Bandits

TL;DR

This work addresses safe linear bandits with an unknown linear constraint that must hold at every round. It introduces directional optimism (ROFUL), which selects directions optimistically and scales down to remain safe, achieving regret, with problem-dependent refinements for well-separated instances. For finite-star-convex action sets, it proposes Safe-PE, an elimination-based method achieving regret with reduced dependence on dimension through logarithmic factors in the number of directions. The paper also extends the framework to linked convex constraints using convex-analysis tools, and provides numerical experiments validating the theoretical gains and comparing with prior approaches. Overall, directional optimism yields tighter geometry-aware regret and broadens applicability to more complex constraint structures in safe learning settings.

Abstract

The safe linear bandit problem is a version of the classical stochastic linear bandit problem where the learner's actions must satisfy an uncertain constraint at all rounds. Due its applicability to many real-world settings, this problem has received considerable attention in recent years. By leveraging a novel approach that we call directional optimism, we find that it is possible to achieve improved regret guarantees for both well-separated problem instances and action sets that are finite star convex sets. Furthermore, we propose a novel algorithm for this setting that improves on existing algorithms in terms of empirical performance, while enjoying matching regret guarantees. Lastly, we introduce a generalization of the safe linear bandit setting where the constraints are convex and adapt our algorithms and analyses to this setting by leveraging a novel convex-analysis based approach.
Paper Structure (40 sections, 29 theorems, 138 equations, 6 figures, 2 tables)

This paper contains 40 sections, 29 theorems, 138 equations, 6 figures, 2 tables.

Key Result

Lemma 1

Let Assumptions ass:set_bound, ass:bounded and ass:noise hold. Also, let Then with probability at least $1 - \delta$, it holds that both $| x^\top (\hat{\theta}_t - \theta)| \leq \beta_t \| x \|_{V_t^{-1}}$ and $| x^\top (\hat{a}_t - a)| \leq \beta_t \| x \|_{V_t^{-1}}$ for all $x \in \mathbb{R}^d$ and all $t \geq 1$.

Figures (6)

  • Figure 1: Graphical representation of the approach for lower bounding $\gamma_t$ (in ROFUL) for the setting with linked convex constraints.
  • Figure 2: Simulation results of our proposed algorithms (ROFUL, Safe-PE) and generic expanded confidence set algorithm GenOP (see Section \ref{['sec:comp']}).
  • Figure 3: Problem-Dependent ROFUL (PD-ROFUL)
  • Figure 4: Safe Phased Elimination (Safe-PE)
  • Figure 5: Simulation results for setting with linked convex constraints and box constraints.
  • ...and 1 more figures

Theorems & Definitions (53)

  • Remark 1
  • Lemma 1: Theorem 2 in abbasi2011improved
  • Theorem 1
  • Remark 2
  • Theorem 2
  • Corollary 1
  • Remark 3
  • Theorem 3
  • Lemma 2
  • proof
  • ...and 43 more