Table of Contents
Fetching ...

Temporal Logic Control of Nonlinear Stochastic Systems with Online Performance Optimization

Alessandro Riccardi, Thom Badings, Luca Laurenti, Alessandro Abate, Bart De Schutter

Abstract

The deployment of autonomous systems in safety-critical environments requires control policies that guarantee satisfaction of complex control specifications. These systems are commonly modeled as nonlinear discrete-time stochastic systems. A~popular approach to computing a policy that provably satisfies a complex control specification is to construct a finite-state abstraction, often represented as a Markov decision process (MDP) with intervals of transition probabilities, i.e., an interval MDP (IMDP). However, existing abstraction techniques compute a \emph{single policy}, thus leaving no room for online cost or performance optimization, e.g., of energy consumption. To overcome this limitation, we propose a novel IMDP abstraction technique that yields a \emph{set of policies}, each of which satisfies the control specification with a certain minimum probability. We can thus use any online control algorithm to search through this set of verified policies while retaining the guaranteed satisfaction probability of the entire policy set. In particular, we employ model predictive control (MPC) to minimize a desired cost function that is independent of the control specification considered in the abstraction. Our experiments demonstrate that our approach yields better control performance than state-of-the-art single-policy abstraction techniques, with a small degradation of the guarantees.

Temporal Logic Control of Nonlinear Stochastic Systems with Online Performance Optimization

Abstract

The deployment of autonomous systems in safety-critical environments requires control policies that guarantee satisfaction of complex control specifications. These systems are commonly modeled as nonlinear discrete-time stochastic systems. A~popular approach to computing a policy that provably satisfies a complex control specification is to construct a finite-state abstraction, often represented as a Markov decision process (MDP) with intervals of transition probabilities, i.e., an interval MDP (IMDP). However, existing abstraction techniques compute a \emph{single policy}, thus leaving no room for online cost or performance optimization, e.g., of energy consumption. To overcome this limitation, we propose a novel IMDP abstraction technique that yields a \emph{set of policies}, each of which satisfies the control specification with a certain minimum probability. We can thus use any online control algorithm to search through this set of verified policies while retaining the guaranteed satisfaction probability of the entire policy set. In particular, we employ model predictive control (MPC) to minimize a desired cost function that is independent of the control specification considered in the abstraction. Our experiments demonstrate that our approach yields better control performance than state-of-the-art single-policy abstraction techniques, with a small degradation of the guarantees.

Paper Structure

This paper contains 19 sections, 3 theorems, 30 equations, 5 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Let ${\mathbf{S}} \preceq_{\textnormal{alt}} {\mathbf{M}}$. Then for every IMDP policy ${\sigma} \colon {\mathcal{S}} \to {\mathcal{A}}$ and every specification ${\mathbf{Y}} \subseteq \prod_{k=1}^\infty (2^{\mathcal{Y}})$, it holds that where the policy ${\pi}$ is defined for all $x \in {\mathcal{X}}$ as $\blacktriangleleft$$\blacktriangleleft$

Figures (5)

  • Figure 1: Our framework leverages formal abstractions to compute a set of certified policies $\tilde{\Pi}$, together with online MPC to optimize a cost function within this certified set.
  • Figure 2: Our abstraction is based on a partition of the state space ${\mathcal{X}}$ (left; showing a reach-avoid problem with goal states in cyan and avoid states in grey, and each abstract action $a_i$ corresponds to an $L_p$-ball $B(u_i,\epsilon_i)$ in the input space ${\mathcal{U}}$ (right; with $L_p$-balls in cyan and their centers in red).
  • Figure 3: Change in the satisfaction probability of the control specification and of the online MPC performance as a function of the area of the $L_\infty$-balls. Further increasing the size above the elbow value will sharply decrease the satisfaction probability. The value of the cost function $J$ will instead decrease until the elbow point in $\lambda$ (i.e., online MPC performance will increase) and then it will start to increase rapidly. Thus, in this case, the best selection of $\epsilon$ occurs exactly at the elbow: greatest performance improvement with the smallest guarantee reduction.
  • Figure 4: Heat maps of the lower bound of the satisfaction probability for the double integrator.
  • Figure 7: Simulated state trajectories for the double integrator, showing the baseline without MPC (policy) and the MPC.

Theorems & Definitions (16)

  • Example 1
  • Remark 1
  • Example 2: continues=ex:dubins
  • Example 3
  • Definition 1: IMDP
  • Definition 2: Interface
  • Definition 3: Set-valued interface
  • Definition 4: Lifting DBLP:journals/siamco/HaesaertSA17
  • Definition 5: Probabilistic alternating simulation
  • Theorem 1: Policy refinement
  • ...and 6 more