Data-driven Interval MDP for Robust Control Synthesis

Rudi Coppola; Andrea Peruffo; Licio Romao; Alessandro Abate; Manuel Mazo

Data-driven Interval MDP for Robust Control Synthesis

Rudi Coppola, Andrea Peruffo, Licio Romao, Alessandro Abate, Manuel Mazo

TL;DR

This work tackles data-driven robust control synthesis for discrete-time stochastic systems with unknown noise distribution by constructing finite-state abstractions that support reach-avoid specifications. It extends prior data-driven abstractions by incorporating nondeterminism in transitions through multiple-target sets, forming a robust MDP (RMDP) that is embedded into an interval MDP (iMDP) to leverage PAC guarantees via sampling. A key contribution is the use of scenario-based bounds to obtain PAC probability intervals for uncertain transitions, enabling valid lower bounds on satisfaction probabilities for the original system after policy refinement. Experimental results on a double-integrator and related models demonstrate that multi-target transitions provide richer controller synthesis at the cost of larger abstractions, with clear guidance on the tradeoffs and future directions for exploiting the structure of the uncertainty. Overall, the approach delivers data-driven, provably sound abstractions that enable reach-avoid control with enhanced flexibility and concrete PAC guarantees.

Abstract

The abstraction of dynamical systems is a powerful tool that enables the design of feedback controllers using a correct-by-design framework. We investigate a novel scheme to obtain data-driven abstractions of discrete-time stochastic processes in terms of richer discrete stochastic models, whose actions lead to nondeterministic transitions over the space of probability measures. The data-driven component of the proposed methodology lies in the fact that we only assume samples from an unknown probability distribution. We also rely on the model of the underlying dynamics to build our abstraction through backward reachability computations. The nondeterminism in the probability space is captured by a collection of Markov Processes, and we identify how this model can improve upon existing abstraction techniques in terms of satisfying temporal properties, such as safety or reach-avoid. The connection between the discrete and the underlying dynamics is made formal through the use of the scenario approach theory. Numerical experiments illustrate the advantages and main limitations of the proposed techniques with respect to existing approaches.

Data-driven Interval MDP for Robust Control Synthesis

TL;DR

Abstract

Paper Structure (14 sections, 2 theorems, 25 equations, 6 figures, 1 table)

This paper contains 14 sections, 2 theorems, 25 equations, 6 figures, 1 table.

Introduction
Notation and Preliminaries
Stochastic Difference Equations
Reach-avoid Specifications
Markov Models
Finite-State Abstraction
State Space Discretization
Actions
Transition Probabilities
Shortcomings and Motivating Example
Uncertain Transition Probabilities
PAC Probability Intervals via Sampling
Experimental Evaluation
Discussion and Conclusions

Key Result

Theorem 1

(PAC probability intervals BRAPPSJ23) Given $Z$ samples of the noise $W_k$, compute $Z_{s_j}^{\text{out}}$ and fix a confidence parameter $\beta$. It holds that where ${\underline{p}_{j,\ell}}= 0$ if $Z_{s_j}^{\text{out}}=Z$; otherwise ${\underline{p}_{j,\ell}}$ is the solution of and ${\overline{p}_{j,\ell}}=1$ if $Z_{s_j}^{\text{out}}=0$; otherwise ${\overline{p}_{j,\ell}}$ is the solution of

Figures (6)

Figure 1: (a) Partition $\mathcal{Q}$ of the domain of interest $\mathcal{X}$, where $Q_i$ and $Q_j$ are two elements of the partition, $c_i$ and $c_j$ are the respective reference points. (b) Cover of target sets $\mathcal{T}$ of the partition $\mathcal{Q}$. Each color represents a different target set.
Figure 2: (a) $\text{Pre}(C_r)$ represented as the union of $\text{Pre}(c_o)$, $\text{Pre}(c_p)$, and $\text{Pre}(c_q)$; $a_r\in\mathcal{A}(s_i)$ (b) Assuming the ordering $o < p < q$, the partition $Q_i^r$ induced on $Q_i$ by $C_{r}$.
Figure 3: (a) Consider $T_j=Q_j$. If action $a_j\in\mathcal{A}(s_i)$ then $\text{Pre}(c_j)\supseteq Q_i$. (b) For every $x_k\in Q_i$ there exists an input $u_k=u^*(x,a_j)$ driving the state to $c_j$. The shaded area represents the support of $T(dx_{k+1}|x_k,u_k)$.
Figure 4: The dynamics are misaligned with the partition.
Figure 5: (a) Computation of $P^\ell(s_i,a_r)(s_q)$ as per \ref{['eq:individual-transition-probability']} for $\ell=1,2,3$. (b) Uncertain transition probability function from a state $s_i$ to a state $s_q$ under action $a_r$.
...and 1 more figures

Theorems & Definitions (10)

Definition 1
Definition 2
Definition 3: givan2000bounded
Definition 4
Remark 1
Remark 2
Example 1
Remark 3
Theorem 1
Theorem 2: Adapted from BRAJ23a

Data-driven Interval MDP for Robust Control Synthesis

TL;DR

Abstract

Data-driven Interval MDP for Robust Control Synthesis

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (10)