Table of Contents
Fetching ...

Safe learning-based control via function-based uncertainty quantification

Abdullah Tokmak, Toni Karvonen, Thomas B. Schön, Dominik Baumann

Abstract

Uncertainty quantification is essential when deploying learning-based control methods in safety-critical systems. This is commonly realized by constructing uncertainty tubes that enclose the unknown function of interest, e.g., the reward and constraint functions or the underlying dynamics model, with high probability. However, existing approaches for uncertainty quantification typically rely on restrictive assumptions on the unknown function, such as known bounds on functional norms or Lipschitz constants, and struggle with discontinuities. In this paper, we model the unknown function as a random function from which independent and identically distributed realizations can be generated, and construct uncertainty tubes via the scenario approach that hold with high probability and rely solely on the sampled realizations. We integrate these uncertainty tubes into a safe Bayesian optimization algorithm, which we then use to safely tune control parameters on a real Furuta pendulum.

Safe learning-based control via function-based uncertainty quantification

Abstract

Uncertainty quantification is essential when deploying learning-based control methods in safety-critical systems. This is commonly realized by constructing uncertainty tubes that enclose the unknown function of interest, e.g., the reward and constraint functions or the underlying dynamics model, with high probability. However, existing approaches for uncertainty quantification typically rely on restrictive assumptions on the unknown function, such as known bounds on functional norms or Lipschitz constants, and struggle with discontinuities. In this paper, we model the unknown function as a random function from which independent and identically distributed realizations can be generated, and construct uncertainty tubes via the scenario approach that hold with high probability and rely solely on the sampled realizations. We integrate these uncertainty tubes into a safe Bayesian optimization algorithm, which we then use to safely tune control parameters on a real Furuta pendulum.

Paper Structure

This paper contains 14 sections, 10 equations, 7 figures, 2 algorithms.

Figures (7)

  • Figure 3: Control parameter tuning using safe BO on a real Furuta pendulum. Figure \ref{['fig:1a']} shows the experimental setup of the Furuta pendulum furuta1992swing and Figure \ref{['fig:1b']} illustrates the exploration of the parameter space. We first (upper sub-figure) conduct an experiment with the initial safe parameter (magenta diamond). Based on the observed reward and constraints, we construct uncertainty tubes, where red and blue denote high and low reward estimates, respectively. Using these estimates, we compute a safe set (green hull) that contains only parameters we believe to be safe with high probability. We sequentially evaluate new parameters using our safe BO algorithm. After 20 iterations (lower sub-figure), we have explored parts of the domain, expanded the safe set, and refined the uncertainty tubes. The cyan square marks the best parameter identified by the safe BO algorithm, which achieves significantly better control performance compared to the initial parameter, as detailed in Section \ref{['sec:furuta']}.
  • Figure 4: Uncertainty tubes established in Theorem \ref{['th:classic_scenario']}. The unknown function $h_0$ (blue) is contained in the uncertainty tubes (orange). The light-orange functions show the $s_t\leq m_t$support scenarios that contribute to the solution of \ref{['eq:scenario_opt']}.
  • Figure 5: Uncertainty tubes established in Alg. \ref{['alg:wait']}/Thm. \ref{['th:uncertainty_tubes']}. The uncertainty tubes following the wait-and-judge framework (gray) are tighter than the classic scenario approach (orange).
  • Figure 6: Synthetic safe BO example. We start (upper sub-figure) to execute Algorithm \ref{['alg:safe_BO']} with one initial sample $S_0$. After $T=30$ iterations (lower sub-figure), Algorithm \ref{['alg:safe_BO']} explored the domain, identified the global maximizer, and did not conduct an unsafe experiment.
  • Figure 7: Reward (scaled) and constraint development for the hardware experiment. The blue points denote the observed reward, while the blue line shows the running maximum. The red stars show the minimum of both constraint values at each iteration, which always remain above the safety threshold (dashed red line).
  • ...and 2 more figures

Theorems & Definitions (3)

  • proof
  • proof
  • proof