Table of Contents
Fetching ...

Data-Driven Control via Conditional Mean Embeddings: Formal Guarantees via Uncertain MDP Abstraction

Ibon Gracia, Morteza Lahijanian

TL;DR

The paper tackles the challenge of guaranteeing performance for controlled stochastic systems with unknown dynamics and complex specifications. It introduces a data-driven framework that learns the transition kernel from trajectory data using conditional mean embeddings and abstracts the system into a finite uncertain MDP, capturing learning and discretization errors. Robust dynamic programming is then used to synthesize policies with formal reach-avoid bounds, which are back-mapped to the original system to provide provable guarantees. The approach is validated on a temperature regulation task, showing formal bounds and improved safety over prior CME-based methods, and it provides explicit finite-sample bounds and a sound abstraction mechanism that does not require prespecified sampling points.

Abstract

Controlling stochastic systems with unknown dynamics and under complex specifications is specially challenging in safety-critical settings, where performance guarantees are essential. We propose a data-driven policy synthesis framework that yields formal performance guarantees for such systems using conditional mean embeddings (CMEs) and uncertain Markov decision processes (UMDPs). From trajectory data, we learn the system's transition kernel as a CME, then construct a finite-state UMDP abstraction whose transition uncertainties capture learning and discretization errors. Next, we generate a policy with formal performance bounds through robust dynamic programming. We demonstrate and empirically validate our method through a temperature regulation benchmark.

Data-Driven Control via Conditional Mean Embeddings: Formal Guarantees via Uncertain MDP Abstraction

TL;DR

The paper tackles the challenge of guaranteeing performance for controlled stochastic systems with unknown dynamics and complex specifications. It introduces a data-driven framework that learns the transition kernel from trajectory data using conditional mean embeddings and abstracts the system into a finite uncertain MDP, capturing learning and discretization errors. Robust dynamic programming is then used to synthesize policies with formal reach-avoid bounds, which are back-mapped to the original system to provide provable guarantees. The approach is validated on a temperature regulation task, showing formal bounds and improved safety over prior CME-based methods, and it provides explicit finite-sample bounds and a sound abstraction mechanism that does not require prespecified sampling points.

Abstract

Controlling stochastic systems with unknown dynamics and under complex specifications is specially challenging in safety-critical settings, where performance guarantees are essential. We propose a data-driven policy synthesis framework that yields formal performance guarantees for such systems using conditional mean embeddings (CMEs) and uncertain Markov decision processes (UMDPs). From trajectory data, we learn the system's transition kernel as a CME, then construct a finite-state UMDP abstraction whose transition uncertainties capture learning and discretization errors. Next, we generate a policy with formal performance bounds through robust dynamic programming. We demonstrate and empirically validate our method through a temperature regulation benchmark.

Paper Structure

This paper contains 16 sections, 7 theorems, 14 equations, 1 figure, 1 table.

Key Result

lemma 1

Let $f(x,w)$ be $L$-Lipschitz continuous in $x$ and let $k$ be the Gaussian kernel. Then, for all $x, \tilde{x} \in X$ such that $\|x - \tilde{x}\| \le \eta$, with $\eta > 0$, it holds that $\mathrm{MMD}(\mathcal{T}(\cdot \mid \tilde{x}), \mathcal{T}(\cdot \mid x)) \le \frac{\sigma_f}{\sigma_l}L\eta

Figures (1)

  • Figure 1: Theoretical bounds $\underline{p},\overline{p}$ and empirical probability $\hat{p}$ as a function of the temperature $x\in X$

Theorems & Definitions (12)

  • definition 1: UMDP
  • lemma 1
  • theorem 1
  • definition 2: $\varphi_x$-Conservative Partition
  • definition 3: UMDP Abstraction
  • proposition 1
  • Remark 1
  • definition 4: Soundness
  • theorem 2: Soundness
  • proposition 2: RDP gracia2025efficient
  • ...and 2 more