Table of Contents
Fetching ...

Robustness to Model Approximation, Model Learning From Data, and Sample Complexity in Wasserstein Regular MDPs

Yichen Zhou, Yanglei Song, Serdar Yüksel

TL;DR

The paper develops a unified theory for robustness of discrete-time stochastic control under Wasserstein-1 model approximation, connecting performance loss from using an approximate model to the Wasserstein distance between true and approximate transition kernels. It establishes Lipschitz continuity of value functions with respect to model changes under Wasserstein regularity, and provides explicit bounds for both discounted and average-cost criteria. The results extend to empirical model learning and disturbance estimation, deriving finite-sample and asymptotic rates, including parametric $O(n^{-1/2})$ bounds in several scenarios and improved rates under additional regularity. The framework supports quantized approximations and simultaneous learning of finite models, with concrete sample complexities for single-trajectory and independent-transition data, and shows how robustness bounds propagate to noise-distribution misspecification and joint model-noise learning. These insights have practical impact on data-driven control and disturbance estimation, offering quantitative guarantees for obreakdash-learning-based controllers.

Abstract

The paper studies the robustness properties of discrete-time stochastic optimal control under Wasserstein model approximation for both discounted cost and average cost criteria. Specifically, we study the performance loss when applying an optimal policy designed for an approximate model to the true dynamics compared with the optimal cost for the true model under the sup-norm-induced metric, and relate it to the Wasserstein-1 distance between the approximate and true transition kernels. A primary motivation of this analysis is empirical model learning, as well as empirical noise distribution learning, where Wasserstein convergence holds under mild conditions but stronger convergence criteria, such as total variation, may not. We discuss applications of the results to the disturbance estimation problem, where sample complexity bounds are given, and also to a general empirical model learning approach, obtained under either Markov or i.i.d. learning settings.

Robustness to Model Approximation, Model Learning From Data, and Sample Complexity in Wasserstein Regular MDPs

TL;DR

The paper develops a unified theory for robustness of discrete-time stochastic control under Wasserstein-1 model approximation, connecting performance loss from using an approximate model to the Wasserstein distance between true and approximate transition kernels. It establishes Lipschitz continuity of value functions with respect to model changes under Wasserstein regularity, and provides explicit bounds for both discounted and average-cost criteria. The results extend to empirical model learning and disturbance estimation, deriving finite-sample and asymptotic rates, including parametric bounds in several scenarios and improved rates under additional regularity. The framework supports quantized approximations and simultaneous learning of finite models, with concrete sample complexities for single-trajectory and independent-transition data, and shows how robustness bounds propagate to noise-distribution misspecification and joint model-noise learning. These insights have practical impact on data-driven control and disturbance estimation, offering quantitative guarantees for obreakdash-learning-based controllers.

Abstract

The paper studies the robustness properties of discrete-time stochastic optimal control under Wasserstein model approximation for both discounted cost and average cost criteria. Specifically, we study the performance loss when applying an optimal policy designed for an approximate model to the true dynamics compared with the optimal cost for the true model under the sup-norm-induced metric, and relate it to the Wasserstein-1 distance between the approximate and true transition kernels. A primary motivation of this analysis is empirical model learning, as well as empirical noise distribution learning, where Wasserstein convergence holds under mild conditions but stronger convergence criteria, such as total variation, may not. We discuss applications of the results to the disturbance estimation problem, where sample complexity bounds are given, and also to a general empirical model learning approach, obtained under either Markov or i.i.d. learning settings.

Paper Structure

This paper contains 29 sections, 34 theorems, 155 equations, 2 algorithms.

Key Result

Lemma 1.1

Given a function $f:\;\mathbb{X}\to\mathbb{R}$ that is $||f||_{\text{Lip}}$-Lipschitz, and two controlled transition kernels $\mathcal{T},\;\mathcal{S}$, then we have

Theorems & Definitions (75)

  • Definition 1.1
  • Definition 1.2: Wasserstein-p Distance, Definition 3.1.1, Figalli2021
  • Remark 1.1
  • Lemma 1.1
  • Remark 1.2
  • Theorem 2.1: DCOE, Theorem 5.2.1 and 5.5.2, serdarlectures
  • Remark 2.1: Discounted Cost Bellman Consistency Equation
  • Definition 2.1: Minorization Condition for Kernels
  • Theorem 2.2: ACOE, Theorem 5.2.1 and 7.2.1, serdarlectures
  • Remark 2.2
  • ...and 65 more