Table of Contents
Fetching ...

No-Regret Algorithms for Safe Bayesian Optimization with Monotonicity Constraints

Arpan Losalka, Jonathan Scarlett

TL;DR

The paper addresses safe Bayesian optimization where a monotone safety function $g$ in a safety variable $s$ constrains the feasible actions, while the objective $f$ need not be monotone. It introduces M-SafeOpt, a GP-based algorithm that eliminates unsafe or suboptimal choices, expands only along the safe boundary, and uses a tailored acquisition to balance exploration and exploitation, achieving sublinear regret for several goals. Theoretical guarantees are provided via information-theoretic bounds that scale with growth constants and GP information gains, along with a refined acquisition variant. Empirical validation on simulated clinical-trial-like problems and synthetic 2D tasks demonstrates safe sampling and improved performance over baselines, highlighting the method’s practical impact for safe, data-efficient optimization in safety-critical domains.

Abstract

We consider the problem of sequentially maximizing an unknown function $f$ over a set of actions of the form $(s,\mathbf{x})$, where the selected actions must satisfy a safety constraint with respect to an unknown safety function $g$. We model $f$ and $g$ as lying in a reproducing kernel Hilbert space (RKHS), which facilitates the use of Gaussian process methods. While existing works for this setting have provided algorithms that are guaranteed to identify a near-optimal safe action, the problem of attaining low cumulative regret has remained largely unexplored, with a key challenge being that expanding the safe region can incur high regret. To address this challenge, we show that if $g$ is monotone with respect to just the single variable $s$ (with no such constraint on $f$), sublinear regret becomes achievable with our proposed algorithm. In addition, we show that a modified version of our algorithm is able to attain sublinear regret (for suitably defined notions of regret) for the task of finding a near-optimal $s$ corresponding to every $\mathbf{x}$, as opposed to only finding the global safe optimum. Our findings are supported with empirical evaluations on various objective and safety functions.

No-Regret Algorithms for Safe Bayesian Optimization with Monotonicity Constraints

TL;DR

The paper addresses safe Bayesian optimization where a monotone safety function in a safety variable constrains the feasible actions, while the objective need not be monotone. It introduces M-SafeOpt, a GP-based algorithm that eliminates unsafe or suboptimal choices, expands only along the safe boundary, and uses a tailored acquisition to balance exploration and exploitation, achieving sublinear regret for several goals. Theoretical guarantees are provided via information-theoretic bounds that scale with growth constants and GP information gains, along with a refined acquisition variant. Empirical validation on simulated clinical-trial-like problems and synthetic 2D tasks demonstrates safe sampling and improved performance over baselines, highlighting the method’s practical impact for safe, data-efficient optimization in safety-critical domains.

Abstract

We consider the problem of sequentially maximizing an unknown function over a set of actions of the form , where the selected actions must satisfy a safety constraint with respect to an unknown safety function . We model and as lying in a reproducing kernel Hilbert space (RKHS), which facilitates the use of Gaussian process methods. While existing works for this setting have provided algorithms that are guaranteed to identify a near-optimal safe action, the problem of attaining low cumulative regret has remained largely unexplored, with a key challenge being that expanding the safe region can incur high regret. To address this challenge, we show that if is monotone with respect to just the single variable (with no such constraint on ), sublinear regret becomes achievable with our proposed algorithm. In addition, we show that a modified version of our algorithm is able to attain sublinear regret (for suitably defined notions of regret) for the task of finding a near-optimal corresponding to every , as opposed to only finding the global safe optimum. Our findings are supported with empirical evaluations on various objective and safety functions.
Paper Structure (59 sections, 4 theorems, 71 equations, 7 figures, 1 algorithm)

This paper contains 59 sections, 4 theorems, 71 equations, 7 figures, 1 algorithm.

Key Result

Lemma 1

For any $\delta > 0$, the parameters provide $(1-\delta)$-valid confidence bounds.

Figures (7)

  • Figure 1: Actions sampled by M-SafeOpt in case 1 (top row) and case 2 (bottom row), along with the safe boundaries discovered in blue and true safe boundaries in red (2nd and 4th column). In case 2, the 1st and 3rd columns also show the optimal $s$ discovered for every $\mathbf{x}$ in cyan, and the true optimal $s$-values in magenta.
  • Figure 2: The top row shows the regret plots for the simulated clinical trial experiment, and the bottom row shows that for the synthetic 2D experiment for M-SafeOpt, along with baseline algorithms. The first column shows the plot for $R_t/t$ (for Case 1), while columns 2 and 3 show $R'_t/t$ and $R^{\mathcal{X}}_t/t$ (for Case 2). The corresponding instantaneous regret values are shown using markers.
  • Figure 3: The first two columns shows the actions sampled by the SafeOpt (row 1) and PredVar (row 2) algorithms for the simulated clinical trial experiment, while the last two columns shows the corresponding plots for the synthetic 2D experiment (from Section \ref{['sec:exp']}). The true safe boundary is shown in red and the boundary discovered by the algorithm is shown in blue (in columns 2 and 4).
  • Figure 4: The first column shows the normalized cumulative regret, $R_t/t$, incurred by M-SafeOpt for the simulated clinical trial experiment for different values of $c$ (where $L_f$ is set to $cL_f$, and $L_g'$ is set to $L_g'/c$), while the second column shows that for $R'_t/t$. The third column corresponds to the synthetic 2D experiment (showing $R_t/t$). The instantaneous regret values are shown using markers.
  • Figure 5: The first column shows the normalized cumulative regret, $R_t/t$, for the synthetic 3D experiment, while the second column shows that for the pendulum swing-up experiment. The corresponding instantaneous regret values are shown using markers.
  • ...and 2 more figures

Theorems & Definitions (4)

  • Lemma 1
  • Theorem 1
  • Theorem 2
  • Theorem 3