Table of Contents
Fetching ...

Learning-Based Optimal Control with Performance Guarantees for Unknown Systems with Latent States

Robert Lefringhausen, Supitsana Srithasan, Armin Lederer, Sandra Hirche

TL;DR

The paper tackles learning-based optimal control for unknown nonlinear systems with latent states and incomplete state measurements. It combines particle Markov chain Monte Carlo to infer dynamics and latent trajectories with scenario theory to derive probabilistic performance and constraint guarantees for fixed controllers and for a scenario-based OCP that optimizes input trajectories. A key contribution is a formal guarantee mechanism based on a finite scenario set and a support sub-sample that bounds the probability of constraint violations and suboptimality; this is demonstrated through simulations with known-basis and GP-based basis function models. The approach provides a principled framework for safe, data-driven control in settings where both dynamics and latent states are uncertain, with practical applicability to safety-critical tasks.

Abstract

As control engineering methods are applied to increasingly complex systems, data-driven approaches for system identification appear as a promising alternative to physics-based modeling. While the Bayesian approaches prevalent for safety-critical applications usually rely on the availability of state measurements, the states of a complex system are often not directly measurable. It may then be necessary to jointly estimate the dynamics and the latent state, making the quantification of uncertainties and the design of controllers with formal performance guarantees considerably more challenging. This paper proposes a novel method for the computation of an optimal input trajectory for unknown nonlinear systems with latent states based on a combination of particle Markov chain Monte Carlo methods and scenario theory. Probabilistic performance guarantees are derived for the resulting input trajectory, and an approach to validate the performance of arbitrary control laws is presented. The effectiveness of the proposed method is demonstrated in a numerical simulation.

Learning-Based Optimal Control with Performance Guarantees for Unknown Systems with Latent States

TL;DR

The paper tackles learning-based optimal control for unknown nonlinear systems with latent states and incomplete state measurements. It combines particle Markov chain Monte Carlo to infer dynamics and latent trajectories with scenario theory to derive probabilistic performance and constraint guarantees for fixed controllers and for a scenario-based OCP that optimizes input trajectories. A key contribution is a formal guarantee mechanism based on a finite scenario set and a support sub-sample that bounds the probability of constraint violations and suboptimality; this is demonstrated through simulations with known-basis and GP-based basis function models. The approach provides a principled framework for safe, data-driven control in settings where both dynamics and latent states are uncertain, with practical applicability to safety-critical tasks.

Abstract

As control engineering methods are applied to increasingly complex systems, data-driven approaches for system identification appear as a promising alternative to physics-based modeling. While the Bayesian approaches prevalent for safety-critical applications usually rely on the availability of state measurements, the states of a complex system are often not directly measurable. It may then be necessary to jointly estimate the dynamics and the latent state, making the quantification of uncertainties and the design of controllers with formal performance guarantees considerably more challenging. This paper proposes a novel method for the computation of an optimal input trajectory for unknown nonlinear systems with latent states based on a combination of particle Markov chain Monte Carlo methods and scenario theory. Probabilistic performance guarantees are derived for the resulting input trajectory, and an approach to validate the performance of arbitrary control laws is presented. The effectiveness of the proposed method is demonstrated in a numerical simulation.
Paper Structure (11 sections, 3 theorems, 12 equations, 3 figures, 4 tables, 2 algorithms)

This paper contains 11 sections, 3 theorems, 12 equations, 3 figures, 4 tables, 2 algorithms.

Key Result

Theorem 1

For a given confidence parameter $\beta \in (0,1)$, under Assumptions as:sample_from_prior, as:iid_posterior, and as:indepentent_control law, it holds that where $V_J$ denotes the probability that the cost (eq:cost), incurred when the control law $\bm{\pi}(\cdot)$ is applied to the unknown system, exceeds $\overline{J_H}$, i.e., $V_J = \mathbb{P}(J_H>\overline{J_H})$. Proof: Due to Assumptions

Figures (3)

  • Figure 1: Normalized auto-correlation function (ACF) between successive samples of the PG sampler without thinning. The red lines represent the ACF for the 10 different entries of the weight matrix $\bm{A}$, the blue lines represent the ACF for the 4 different entries of the process noise covariance matrix $\bm{Q}$, and the green lines represent the ACF for the 2 different entries of the state $\bm{x}_{-1}$.
  • Figure 2: Example of the optimal control with known basis functions. The red area shows the output constraints, the gray area encompasses the 200 scenarios that were used to determine the input trajectory, the green line shows the mean prediction, and the blue line shows one realization of the output of the actual system when the input trajectory $\bm{u}^\star_{0:H}$ is applied from time $t=0$.
  • Figure 3: Example of the optimal control with generic basis functions. The red area shows the output constraints, the gray area encompasses the 100 scenarios that were used to determine the input trajectory, the green line shows the mean prediction, and the blue line shows one realization of the output of the actual system when the input trajectory $\bm{u}^\star_{0:H}$ is applied from time $t=0$.

Theorems & Definitions (7)

  • Remark 1
  • Remark 2
  • Remark 3
  • Theorem 1
  • Theorem 2
  • Remark 4
  • Theorem 3