Table of Contents
Fetching ...

PAC-Bayesian Optimal Control with Stability and Generalization Guarantees

Mahrokh Ghoddousi Boroujeni, Clara Lucía Galimberti, Andreas Krause, Giancarlo Ferrari-Trecate

TL;DR

This work addresses generalization in stochastic nonlinear optimal control by extending PAC-Bayes theory to SNOC and providing a randomized, stability-guaranteed framework. It introduces a Gibbs posterior over stabilizing controller parameters, coupled with tractable approximations (SVGD, normalizing flows) and a two-stage inference scheme to tighten bounds. The approach yields a principled balance between empirical performance and prior knowledge, ensuring closed-loop stability via expressive neural controllers (REN/SSM) and data-driven priors. Empirical results on an LTI system and cooperative robotics tasks demonstrate improved generalization over empirical SNOC and highlight scalability and practical performance gains.

Abstract

Stochastic Nonlinear Optimal Control (SNOC) seeks to minimize a cost function that accounts for random disturbances acting on a nonlinear dynamical system. Since the expectation over all disturbances is generally intractable, a common surrogate is the empirical cost, obtained by averaging over a finite dataset of sampled noise realizations. This substitution, however, introduces the challenge of guaranteeing performance under unseen disturbances. The issue is particularly severe when the dataset is limited, as the trained controllers may overfit, leading to substantial gaps between their empirical cost and the deployment cost. In this work, we develop a PAC-Bayesian framework that establishes rigorous generalization bounds for SNOC. Building on these bounds, we propose a principled controller design method that balances empirical performance and prior knowledge. To ensure tractability, we derive computationally efficient relaxations of the bounds and employ approximate inference methods. Our framework further leverages expressive neural controller parameterizations, guaranteeing closed-loop stability. Through simulated examples, we highlight how prior knowledge can be incorporated into control design and how more reliable controllers can be synthesized for cooperative robotics.

PAC-Bayesian Optimal Control with Stability and Generalization Guarantees

TL;DR

This work addresses generalization in stochastic nonlinear optimal control by extending PAC-Bayes theory to SNOC and providing a randomized, stability-guaranteed framework. It introduces a Gibbs posterior over stabilizing controller parameters, coupled with tractable approximations (SVGD, normalizing flows) and a two-stage inference scheme to tighten bounds. The approach yields a principled balance between empirical performance and prior knowledge, ensuring closed-loop stability via expressive neural controllers (REN/SSM) and data-driven priors. Empirical results on an LTI system and cooperative robotics tasks demonstrate improved generalization over empirical SNOC and highlight scalability and practical performance gains.

Abstract

Stochastic Nonlinear Optimal Control (SNOC) seeks to minimize a cost function that accounts for random disturbances acting on a nonlinear dynamical system. Since the expectation over all disturbances is generally intractable, a common surrogate is the empirical cost, obtained by averaging over a finite dataset of sampled noise realizations. This substitution, however, introduces the challenge of guaranteeing performance under unseen disturbances. The issue is particularly severe when the dataset is limited, as the trained controllers may overfit, leading to substantial gaps between their empirical cost and the deployment cost. In this work, we develop a PAC-Bayesian framework that establishes rigorous generalization bounds for SNOC. Building on these bounds, we propose a principled controller design method that balances empirical performance and prior knowledge. To ensure tractability, we derive computationally efficient relaxations of the bounds and employ approximate inference methods. Our framework further leverages expressive neural controller parameterizations, guaranteeing closed-loop stability. Through simulated examples, we highlight how prior knowledge can be incorporated into control design and how more reliable controllers can be synthesized for cooperative robotics.

Paper Structure

This paper contains 33 sections, 7 theorems, 27 equations, 8 figures, 2 tables.

Key Result

Theorem 1

Let $\mathbb{S}$ be a dataset consisting of $S$ noise sequences sampled from $\mathcal{D}_{T:0}$. Fix a prior $\mathcal{P}$ independent of $\mathbb{S}$ and any posterior $\mathcal{Q}$. Then, for any $\lambda > 0$, confidence level $\delta \in (0,1)$, and controller parameters $\theta \sim \mathcal{Q Each inequality holds with probability at least $1-\delta$, and both hold simultaneously with proba

Figures (8)

  • Figure 1: Two-stage inference approach. In the first stage, a subset $\mathbb{S}_1 \subset \mathbb{S}$ together with a data-independent prior $\mathcal{P}_1 = \mathcal{P}$ is used to infer a posterior $\mathcal{Q}_1$. In the second stage, the remaining data $\mathbb{S}_2 = \mathbb{S} \setminus \mathbb{S}_1$ is combined with $\mathcal{P}_2 = \mathcal{Q}_1$ to infer a posterior $\mathcal{Q}_2$. Both stages follow the inference procedure of \ref{['corol:qstar']}. When the Gibbs parameters $\lambda_1$ and $\lambda_2$ are chosen according to \ref{['prop:twostage']}, the resulting posterior coincides with that of the single-stage procedure, i.e., $\mathcal{Q}_2 = \mathcal{Q}^*$.
  • Figure 2: Overview of the components in our framework, highlighting their role in building a tractable and theoretically grounded pipeline. The approximation methods introduced in \ref{['sec:approx']} replace intractable components () with tractable counterparts ().
  • Figure 3: Approximating a one-dimensional target distribution (red) over a variable $x$ using SVGD (top) and normalizing flows (bottom). In SVGD, particles (cyan crosses) are initialized uniformly and move toward high-density regions after training. In normalizing flows, both the Gaussian base distribution (magenta) and the transformations are trainable, allowing the transformed distribution (cyan) to better align with the target after training.
  • Figure 4: Discretized PDFs of the prior distributions (left) and the optimal posterior distributions for $S = 8$ (middle) and $S = 512$ (right). The top and bottom rows correspond to $\mathcal{P}_\mathcal{U}$ and $\mathcal{P}_\mathcal{N}$, respectively. In each plot, the horizontal and vertical axes represent $\beta$ and $k$, respectively, and the color encodes the PDF value. The empirical and benchmark controllers are indicated by markers.
  • Figure 5: Comparison of the transformed true cost $\mathcal{L}$ and the upper bound in \ref{['corol:ub_qstar']} for various configurations as a function of $S$. Colors indicate the choices of $\delta$ and prior distribution $\mathcal{P}$. For each configuration, the true cost is approximated for $10$ parameter vectors $\theta$ sampled from $\mathcal{Q}^*$, shown as vertically aligned circles.
  • ...and 3 more figures

Theorems & Definitions (10)

  • Definition 1: $\ell_p$-sequences
  • Definition 2: $\ell_p$-stable mappings
  • Definition 3: $\ell_p$-stable closed-loop systems
  • Theorem 1
  • Corollary 1: Lemma 1.1.3 in Catoni
  • Corollary 2
  • Proposition 1
  • Proposition 2
  • Corollary 3
  • Lemma 1: McDiarmid's inequality concentration