Information-Theoretic Safe Bayesian Optimization
Alessandro G. Bottero, Carlos E. Luis, Julia Vinogradska, Felix Berkenkamp, Jan Peters
TL;DR
This work tackles safe Bayesian optimization when the safety constraint is unknown, formulating the problem to maximize an unknown objective while safely exploring the domain. It introduces Information-Theoretic Safe Exploration (ISE), which directly maximizes information gain about parameter safety, and combines it with Max-Value Entropy Search (MES) to yield ISE-BO, a method that naturally handles continuous domains without extra hyperparameters. Theoretical results show that the approach expands the largest reachable safe set and converges to the safe optimum within that set with arbitrary precision, while empirical evaluations demonstrate improved data-efficiency and scalability across synthetic, high-noise, and control tasks. The proposed framework offers a principled, information-driven mechanism for safe exploration and optimization with practical impact for robotics and safety-critical systems.
Abstract
We consider a sequential decision making task, where the goal is to optimize an unknown function without evaluating parameters that violate an a~priori unknown (safety) constraint. A common approach is to place a Gaussian process prior on the unknown functions and allow evaluations only in regions that are safe with high probability. Most current methods rely on a discretization of the domain and cannot be directly extended to the continuous case. Moreover, the way in which they exploit regularity assumptions about the constraint introduces an additional critical hyperparameter. In this paper, we propose an information-theoretic safe exploration criterion that directly exploits the GP posterior to identify the most informative safe parameters to evaluate. The combination of this exploration criterion with a well known Bayesian optimization acquisition function yields a novel safe Bayesian optimization selection criterion. Our approach is naturally applicable to continuous domains and does not require additional explicit hyperparameters. We theoretically analyze the method and show that we do not violate the safety constraint with high probability and that we learn about the value of the safe optimum up to arbitrary precision. Empirical evaluations demonstrate improved data-efficiency and scalability.
