Directional Optimism for Safe Linear Bandits

Spencer Hutchinson; Berkay Turan; Mahnoosh Alizadeh

Directional Optimism for Safe Linear Bandits

Spencer Hutchinson, Berkay Turan, Mahnoosh Alizadeh

TL;DR

This work addresses safe linear bandits with an unknown linear constraint $a^\top x \le b$ that must hold at every round. It introduces directional optimism (ROFUL), which selects directions optimistically and scales down to remain safe, achieving $\tilde{O}(d\sqrt{T})$ regret, with problem-dependent refinements for well-separated instances. For finite-star-convex action sets, it proposes Safe-PE, an elimination-based method achieving $\tilde{O}(\sqrt{dT})$ regret with reduced dependence on dimension through logarithmic factors in the number of directions. The paper also extends the framework to linked convex constraints $A x_t \in \mathcal{G}$ using convex-analysis tools, and provides numerical experiments validating the theoretical gains and comparing with prior approaches. Overall, directional optimism yields tighter geometry-aware regret and broadens applicability to more complex constraint structures in safe learning settings.

Abstract

The safe linear bandit problem is a version of the classical stochastic linear bandit problem where the learner's actions must satisfy an uncertain constraint at all rounds. Due its applicability to many real-world settings, this problem has received considerable attention in recent years. By leveraging a novel approach that we call directional optimism, we find that it is possible to achieve improved regret guarantees for both well-separated problem instances and action sets that are finite star convex sets. Furthermore, we propose a novel algorithm for this setting that improves on existing algorithms in terms of empirical performance, while enjoying matching regret guarantees. Lastly, we introduce a generalization of the safe linear bandit setting where the constraints are convex and adapt our algorithms and analyses to this setting by leveraging a novel convex-analysis based approach.

Directional Optimism for Safe Linear Bandits

TL;DR

This work addresses safe linear bandits with an unknown linear constraint

that must hold at every round. It introduces directional optimism (ROFUL), which selects directions optimistically and scales down to remain safe, achieving

regret, with problem-dependent refinements for well-separated instances. For finite-star-convex action sets, it proposes Safe-PE, an elimination-based method achieving

regret with reduced dependence on dimension through logarithmic factors in the number of directions. The paper also extends the framework to linked convex constraints

using convex-analysis tools, and provides numerical experiments validating the theoretical gains and comparing with prior approaches. Overall, directional optimism yields tighter geometry-aware regret and broadens applicability to more complex constraint structures in safe learning settings.

Abstract

Paper Structure (40 sections, 29 theorems, 138 equations, 6 figures, 2 tables)

This paper contains 40 sections, 29 theorems, 138 equations, 6 figures, 2 tables.

Introduction
Contributions
Related Work
Preliminaries
Notation
Problem Setup
Technical Approach
Restrained Optimism Algorithm
Optimistic Direction Selection
Confidence Sets for Unknown Parameters
General Analysis
Problem-dependent Analysis
Wrong Directions are Rarely Selected
Nearly Dimension-free Regret
Comparison with Existing Algorithms
...and 25 more sections

Key Result

Lemma 1

Let Assumptions ass:set_bound, ass:bounded and ass:noise hold. Also, let Then with probability at least $1 - \delta$, it holds that both $| x^\top (\hat{\theta}_t - \theta)| \leq \beta_t \| x \|_{V_t^{-1}}$ and $| x^\top (\hat{a}_t - a)| \leq \beta_t \| x \|_{V_t^{-1}}$ for all $x \in \mathbb{R}^d$ and all $t \geq 1$.

Figures (6)

Figure 1: Graphical representation of the approach for lower bounding $\gamma_t$ (in ROFUL) for the setting with linked convex constraints.
Figure 2: Simulation results of our proposed algorithms (ROFUL, Safe-PE) and generic expanded confidence set algorithm GenOP (see Section \ref{['sec:comp']}).
Figure 3: Problem-Dependent ROFUL (PD-ROFUL)
Figure 4: Safe Phased Elimination (Safe-PE)
Figure 5: Simulation results for setting with linked convex constraints and box constraints.
...and 1 more figures

Theorems & Definitions (53)

Remark 1
Lemma 1: Theorem 2 in abbasi2011improved
Theorem 1
Remark 2
Theorem 2
Corollary 1
Remark 3
Theorem 3
Lemma 2
proof
...and 43 more

Directional Optimism for Safe Linear Bandits

TL;DR

Abstract

Directional Optimism for Safe Linear Bandits

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (53)