Table of Contents
Fetching ...

Formal Verification of Unknown Stochastic Systems via Non-parametric Estimation

Zhi Zhang, Chenyu Ma, Saleh Soudijani, Sadegh Soudjani

TL;DR

This work tackles formal verification of unknown stochastic systems by combining non-parametric density estimation with Lipschitz-constant (LC) analysis to construct finite IMDP abstractions. It introduces an LC estimation method with asymptotic convergence $O(n^{- rac{1}{3+d}})$ and develops two data-driven IMDP construction routes: empirical bounds via Chebyshev concentration and non-parametric bounds via conditional densities, both enabling provable closeness between the original system and its abstraction. The framework supports PCTL verification and policy synthesis on the IMDP, demonstrated through case studies on unknown linear and switched systems that validate accuracy and robustness against discretization. Overall, the approach enables safety-critical verification and synthesis for systems with unknown models using only observational data, with actionable guarantees on approximation error and convergence.

Abstract

A novel data-driven method for formal verification is proposed to study complex systems operating in safety-critical domains. The proposed approach is able to formally verify discrete-time stochastic dynamical systems against temporal logic specifications only using observation samples and without the knowledge of the model, and provide a probabilistic guarantee on the satisfaction of the specification. We first propose the theoretical results for using non-parametric estimation to estimate an asymptotic upper bound for the \emph{Lipschitz constant} of the stochastic system, which can determine a finite abstraction of the system. Our results prove that the asymptotic convergence rate of the estimation is $O(n^{-\frac{1}{3+d}})$, where $d$ is the dimension of the system and $n$ is the data scale. We then construct interval Markov decision processes using two different data-driven methods, namely non-parametric estimation and empirical estimation of transition probabilities, to perform formal verification against a given temporal logic specification. Multiple case studies are presented to validate the effectiveness of the proposed methods.

Formal Verification of Unknown Stochastic Systems via Non-parametric Estimation

TL;DR

This work tackles formal verification of unknown stochastic systems by combining non-parametric density estimation with Lipschitz-constant (LC) analysis to construct finite IMDP abstractions. It introduces an LC estimation method with asymptotic convergence and develops two data-driven IMDP construction routes: empirical bounds via Chebyshev concentration and non-parametric bounds via conditional densities, both enabling provable closeness between the original system and its abstraction. The framework supports PCTL verification and policy synthesis on the IMDP, demonstrated through case studies on unknown linear and switched systems that validate accuracy and robustness against discretization. Overall, the approach enables safety-critical verification and synthesis for systems with unknown models using only observational data, with actionable guarantees on approximation error and convergence.

Abstract

A novel data-driven method for formal verification is proposed to study complex systems operating in safety-critical domains. The proposed approach is able to formally verify discrete-time stochastic dynamical systems against temporal logic specifications only using observation samples and without the knowledge of the model, and provide a probabilistic guarantee on the satisfaction of the specification. We first propose the theoretical results for using non-parametric estimation to estimate an asymptotic upper bound for the \emph{Lipschitz constant} of the stochastic system, which can determine a finite abstraction of the system. Our results prove that the asymptotic convergence rate of the estimation is , where is the dimension of the system and is the data scale. We then construct interval Markov decision processes using two different data-driven methods, namely non-parametric estimation and empirical estimation of transition probabilities, to perform formal verification against a given temporal logic specification. Multiple case studies are presented to validate the effectiveness of the proposed methods.
Paper Structure (41 sections, 10 theorems, 115 equations, 8 figures, 4 tables, 1 algorithm)

This paper contains 41 sections, 10 theorems, 115 equations, 8 figures, 4 tables, 1 algorithm.

Key Result

Lemma 3.1

Suppose that $f_{Y|X}(y,x)$ satisfies Assumption ass_1(a). For any $(x,y)\in D_X\times D_Y$, $h_{\mathsf x},h_{\mathsf y}>0$, we have that for large $n$ if $nh_{\mathsf x}^{3}h_{\mathsf y}\to +\infty$ and $h_{\mathsf x},h_{\mathsf y}\to 0$ as $n\to +\infty$, then where $\lesssim$ denotes an asymptotic bound for large $n$ and $C_{1}:=\mathrm{Vol}(D_X)G_{20}(K)C_{f}$ with $\mathrm{Vol}(\cdot)$ indi

Figures (8)

  • Figure 1: Workflow of the proposed data-driven formal verification framework.
  • Figure 2: The upper and lower bounds on the probability of satisfying the specification by the linear system with $\delta=0.4$ in the left and $\delta=0.1$ in the right, and $O=[1.2,2]\times [1.6,2]$ and $D=[0,0.8]\times [0,0.4]$. The panels (a) and (b) show the results from a model-based approach. The panels (c) and (d) show the results of the data-driven approximation using the empirical approach. The panels (e) and (f) are for NPE.
  • Figure 3: The upper and lower bounds on the probability of satisfying the specification by the switched system with $\delta=0.4$ in the left and $\delta=0.1$ in the right, and $O=[1.2,2]\times [1.6,2]$ and $D=[0,0.8]\times [0,0.4]$. The panels (a) and (b) show the results from a model-based approach. The panels (c) and (d) show the results of the data-driven approximation using the empirical approach. The panels (e) and (f) are for NPE.
  • Figure 4: (a) The original CoDF. (b) Estimated CoDF with $h_{\mathsf x}=h_{\mathsf y}=n^{-\frac{1}{8}}$ and data scale $n=6\times 10^{4}$. (c) Estimated CoDF with bandwidth from the Scott's formula and data scale $n=6\times 10^{4}$. (d) Asymptotic bound on the original LC provided by Theorem \ref{['LC_main']} as a function of data scale $n$. The dashed line is the original LC of the CoDF, $L=0.1210$. (e) The estimated LC averaged over 150 computations with the grey area indicating the 99% empirical confidence interval (3 times the empirical standard deviation from the mean).
  • Figure 5: (a) The original CoDF. (b) Estimated CoDF with $h_{\mathsf x}=h_{\mathsf y}=n^{-\frac{1}{8}}$ and data scale $n=6\times 10^{4}$. (c) Estimated CoDF with bandwidth from the Scott's formula and data scale $n=6\times 10^{4}$. (d) Asymptotic bound on the original LC provided by Theorem \ref{['LC_main']} as a function of data scale $n$. The dashed line is the original LC of the CoDF, $L=0.0968$. (e) The estimated LC averaged over 150 computations with the grey area indicating the 99% empirical confidence interval (3 times the empirical standard deviation from the mean).
  • ...and 3 more figures

Theorems & Definitions (32)

  • Definition 2.1: IMDP
  • Definition 2.2: Adversary
  • Remark
  • Lemma 3.1
  • Lemma 3.2
  • Theorem 3.3
  • Theorem 3.4: Bias of the Estimation
  • Remark
  • Remark
  • Lemma 4.1
  • ...and 22 more