Table of Contents
Fetching ...

Distributionally Robust Policy and Lyapunov-Certificate Learning

Kehan Long, Jorge Cortes, Nikolay Atanasov

TL;DR

This article presents novel methods for synthesizing distributionally robust stabilizing neural controllers and certificates for control systems under model uncertainty with a novel distributionally robust formulation of the Lyapunov derivative chance constraint ensuring a monotonic decrease of the Lyapunov certificate.

Abstract

This article presents novel methods for synthesizing distributionally robust stabilizing neural controllers and certificates for control systems under model uncertainty. A key challenge in designing controllers with stability guarantees for uncertain systems is the accurate determination of and adaptation to shifts in model parametric uncertainty during online deployment. We tackle this with a novel distributionally robust formulation of the Lyapunov derivative chance constraint ensuring a monotonic decrease of the Lyapunov certificate. To avoid the computational complexity involved in dealing with the space of probability measures, we identify a sufficient condition in the form of deterministic convex constraints that ensures the Lyapunov derivative constraint is satisfied. We integrate this condition into a loss function for training a neural network-based controller and show that, for the resulting closed-loop system, the global asymptotic stability of its equilibrium can be certified with high confidence, even with Out-of-Distribution (OoD) model uncertainties. To demonstrate the efficacy and efficiency of the proposed methodology, we compare it with an uncertainty-agnostic baseline approach and several reinforcement learning approaches in two control problems in simulation.

Distributionally Robust Policy and Lyapunov-Certificate Learning

TL;DR

This article presents novel methods for synthesizing distributionally robust stabilizing neural controllers and certificates for control systems under model uncertainty with a novel distributionally robust formulation of the Lyapunov derivative chance constraint ensuring a monotonic decrease of the Lyapunov certificate.

Abstract

This article presents novel methods for synthesizing distributionally robust stabilizing neural controllers and certificates for control systems under model uncertainty. A key challenge in designing controllers with stability guarantees for uncertain systems is the accurate determination of and adaptation to shifts in model parametric uncertainty during online deployment. We tackle this with a novel distributionally robust formulation of the Lyapunov derivative chance constraint ensuring a monotonic decrease of the Lyapunov certificate. To avoid the computational complexity involved in dealing with the space of probability measures, we identify a sufficient condition in the form of deterministic convex constraints that ensures the Lyapunov derivative constraint is satisfied. We integrate this condition into a loss function for training a neural network-based controller and show that, for the resulting closed-loop system, the global asymptotic stability of its equilibrium can be certified with high confidence, even with Out-of-Distribution (OoD) model uncertainties. To demonstrate the efficacy and efficiency of the proposed methodology, we compare it with an uncertainty-agnostic baseline approach and several reinforcement learning approaches in two control problems in simulation.
Paper Structure (18 sections, 6 theorems, 52 equations, 5 figures)

This paper contains 18 sections, 6 theorems, 52 equations, 5 figures.

Key Result

Lemma 5.2

Assume the distribution $\mathbb{P}^*$ of $\boldsymbol{\xi}$ in eq: uncertain_system is light-tailed and the Wasserstein radius $r_N(\bar{\epsilon})$ is set according to eq: wasserstein_r_guarantee. If the controller $\boldsymbol{\pi}^*(\mathbf{x})$ and Lyapunov function $V^*(\mathbf{x})$ pair satis

Figures (5)

  • Figure 1: Illustration of VaR and CVaR within the distribution of a random variable. $\textrm{VaR}_{1-\epsilon}$ is the lower $1-\epsilon$ percentile of the random variable while $\textrm{CVaR}_{1-\epsilon}$ computes the expected value of the realizations above VaR.
  • Figure 2: Comparison of trajectories for the inverted pendulum system with test case parameters: mass $m=1.1$, length $l=1.0$, and damping $b=0.18$. The $10$ random sampled initial states are marked as green dots, while the final states are marked as red crosses. The states $(\theta, \dot{\theta}) = (2k\pi, 0)$ for $k \in \mathbb{N}$ are stable equilibrium states, while the points $((2k-1)\pi, 0)$ are unstable equilibrium points, corresponding to the upside-down position of the pendulum. In (a), the baseline controller, trained using the average mass and damping from offline observations, fails to stabilize the pendulum to upright. In (b), the distributionally robust (DR) controller successfully stabilizes the pendulum to an upright position for every initial state, demonstrating improved robustness to distributional shifts in the system parameters.
  • Figure 3: Comparison of Lyapunov function values, control inputs, and RL value functions for the initial state $(\pi, 0)$. (a) Lyapunov function values over time for the baseline and DR controllers. (b) Control inputs over time for the baseline and DR controllers. (c) Value function values over time for the SAC and PPO algorithms. (d) Control inputs generated by the SAC and PPO algorithms.
  • Figure 4: Comparison of Lyapunov function values, control inputs, and RL value functions for the initial state $(-\pi/2, 5.5)$. (a) Lyapunov function values over time for the baseline and DR controllers. (b) Control inputs over time for the baseline and DR controllers. (c) Value function values over time for the SAC and PPO algorithms. (d) Control inputs generated by the SAC and PPO algorithms.
  • Figure 5: Comparison of trajectories for the mountain car system with test case power parameter $p=0.0012$. The $10$ random sampled initial states are marked as green dots while the final states are marked as red crosses. In (a), the baseline controller, trained using the average power from offline observations, fails to stabilize the car to the desired equilibrium $(\pi/6, 0)$. Instead, (b) shows the distributionally robust (DR) controller successfully stabilizing the car to the top of the mountain for every initial state, demonstrating improved robustness to distributional shifts in system parameters.

Theorems & Definitions (25)

  • Remark 5.1: Choice of Wasserstein ball radius
  • proof
  • Lemma 5.2: Chance-constraint satisfaction under the true distribution
  • proof
  • Lemma 5.3: Global asymptotic stability in probability
  • proof
  • Remark 5.4: Exponential stability under additional conditions
  • proof
  • Remark 5.5: Connections to other probabilistic stability notions
  • proof
  • ...and 15 more