Table of Contents
Fetching ...

Computing Optimal Joint Chance Constrained Control Policies

Niklas Schmid, Marta Fochesato, Sarah H. Q. Li, Tobias Sutter, John Lygeros

TL;DR

This work augments the dynamics via a binary state, allowing to characterize the optimal policies and develop a dynamic programming-based solution method for optimally controlling stochastic, Markovian systems subject to joint chance constraints over a finite-time horizon.

Abstract

We consider the problem of optimally controlling stochastic, Markovian systems subject to joint chance constraints over a finite-time horizon. For such problems, standard Dynamic Programming is inapplicable due to the time correlation of the joint chance constraints, which calls for non-Markovian, and possibly stochastic, policies. Hence, despite the popularity of this problem, solution approaches capable of providing provably-optimal and easy-to-compute policies are still missing. We fill this gap by augmenting the dynamics via a binary state, allowing us to characterize the optimal policies and develop a Dynamic Programming based solution method.

Computing Optimal Joint Chance Constrained Control Policies

TL;DR

This work augments the dynamics via a binary state, allowing to characterize the optimal policies and develop a dynamic programming-based solution method for optimally controlling stochastic, Markovian systems subject to joint chance constraints over a finite-time horizon.

Abstract

We consider the problem of optimally controlling stochastic, Markovian systems subject to joint chance constraints over a finite-time horizon. For such problems, standard Dynamic Programming is inapplicable due to the time correlation of the joint chance constraints, which calls for non-Markovian, and possibly stochastic, policies. Hence, despite the popularity of this problem, solution approaches capable of providing provably-optimal and easy-to-compute policies are still missing. We fill this gap by augmenting the dynamics via a binary state, allowing us to characterize the optimal policies and develop a Dynamic Programming based solution method.
Paper Structure (15 sections, 10 theorems, 44 equations, 9 figures, 1 algorithm)

This paper contains 15 sections, 10 theorems, 44 equations, 9 figures, 1 algorithm.

Key Result

Theorem III.1

(Attainability of Deterministic Markov Policies) Given a fixed $\lambda \in \mathbb{R}_{\geq 0}$, there exists a measurable deterministic Markov policy that attains the infimum in eq_our_dp_recursion_innerdual at every time-step $k\in[N]$ and is also an optimal solution to Problem eq_inner_problem_d

Figures (9)

  • Figure 1: Graphical representation of the paper structure.
  • Figure 2: An illustration of a objective function for the outer maximization in Problem \ref{['eq_lagrange_dual_over_stochasticCausal']} (inspired by Ono_2). The policy $\pi_{\lambda}$ denotes an optimal argument to the inner minimization of Problem \ref{['eq_lagrange_dual_over_stochasticCausal']} under given $\lambda$ (assuming it exists) and $C_0^{\pi_{\lambda}}(\Tilde{x}_0),V_0^{\pi_{\lambda}}(\Tilde{x}_0)$ the cost and safety associated with that policy.
  • Figure 3: A Performance Set $P_{\Pi_{\text{mix}},\Tilde{x}_0}$ and Pareto front $P_{\Pi_{\text{mix}},\Tilde{x}_0}^{\star}$.
  • Figure 4: In the left plot, we can choose from policies which have a safety arbitrarily close to $\alpha$ and a control cost of $C$ or a policy that attains $\alpha$ but at cost $C+\delta$, $\delta>0$. Then, for any $\lambda$, there always exists a policy $\pi$ with safety close enough to $\alpha$ such that it is not optimal to incur the additional cost $\delta$, i.e., $\lambda(\alpha-V^{\pi}_0)<\delta$. The right plot depicts a similar border case.
  • Figure 5: Performance sets $P_{\Pi_{\text{d}},\Tilde{x}_0}$ (bordered set) and its convex hull $P_{\Pi_{\text{mix}},\Tilde{x}_0}$ (grey set), as well as the performance of the respective policies $\pi_{\underline{\lambda}},\pi_{\overline{\lambda}}$ (black stars), the performance of all mixed policies constructable from $\pi_{\underline{\lambda}}$ and $\pi_{\overline{\lambda}}$ (red line), and the optimal interpolation $\pi_{\text{mix}}$ according to equation \ref{['eq_stochastic_policies_ratios']} (red star). The variable $\lambda$ sets the optimization direction. The DP recursion returns the optimal policy in the performance set in this direction. Over the outer loop iterations $\underline{\lambda}$ approaches $\overline{\lambda}$.
  • ...and 4 more figures

Theorems & Definitions (21)

  • Theorem III.1
  • proof
  • Definition III.2
  • Proposition III.3
  • proof
  • Corollary III.4
  • Lemma III.5: Attainability of Mixed Policies
  • proof
  • Corollary III.6
  • proof
  • ...and 11 more