Learning-Based Optimal Control with Performance Guarantees for Unknown Systems with Latent States
Robert Lefringhausen, Supitsana Srithasan, Armin Lederer, Sandra Hirche
TL;DR
The paper tackles learning-based optimal control for unknown nonlinear systems with latent states and incomplete state measurements. It combines particle Markov chain Monte Carlo to infer dynamics and latent trajectories with scenario theory to derive probabilistic performance and constraint guarantees for fixed controllers and for a scenario-based OCP that optimizes input trajectories. A key contribution is a formal guarantee mechanism based on a finite scenario set and a support sub-sample that bounds the probability of constraint violations and suboptimality; this is demonstrated through simulations with known-basis and GP-based basis function models. The approach provides a principled framework for safe, data-driven control in settings where both dynamics and latent states are uncertain, with practical applicability to safety-critical tasks.
Abstract
As control engineering methods are applied to increasingly complex systems, data-driven approaches for system identification appear as a promising alternative to physics-based modeling. While the Bayesian approaches prevalent for safety-critical applications usually rely on the availability of state measurements, the states of a complex system are often not directly measurable. It may then be necessary to jointly estimate the dynamics and the latent state, making the quantification of uncertainties and the design of controllers with formal performance guarantees considerably more challenging. This paper proposes a novel method for the computation of an optimal input trajectory for unknown nonlinear systems with latent states based on a combination of particle Markov chain Monte Carlo methods and scenario theory. Probabilistic performance guarantees are derived for the resulting input trajectory, and an approach to validate the performance of arbitrary control laws is presented. The effectiveness of the proposed method is demonstrated in a numerical simulation.
