Consistent inverse optimal control for discrete-time nonlinear stochastic systems
Ziliang Wang, Han Zhang, Axel Ringh
TL;DR
The paper tackles inverse optimal control for discrete-time nonlinear stochastic systems by reformulating the forward problem with occupancy measures into an infinite-dimensional linear program. It then derives a finite-dimensional, convex sum-of-squares estimator via polynomial approximation, proving asymptotic and statistical consistency as data and polynomial order grow. Numerical experiments on linear, nonlinear, and chaotic-like systems validate accuracy, robustness, and generalization, and show policy reconstruction benefits beyond behaviour cloning. The approach offers a scalable, theoretically grounded IOC framework capable of handling noise, nonlinearity, and long-horizon discounting in practice.
Abstract
Inverse Optimal Control (IOC) seeks to recover an unknown cost from expert demonstrations, and it provides a systematic way of modeling experts' decision mechanisms while considering the prior information of the cost functions. Nevertheless, existing IOC methods have consistency issue with the estimator under noisy and nonlinear settings. In this paper, we consider a discrete-time nonlinear system with process noise, and it is controlled by an optimal policy that minimizes the expectation of a discounted cumulative cost function across an infinite time-horizon. In particular, the cost function takes the form of a linear combination of a priori known feature functions. In this setting, we first adopt Lasserre's reformulation of the forward problem with occupancy measure. Next, we propose the infinite dimensional IOC algorithm and further approximate it with Lagrange interpolating polynomials, which results in a convex, finite-dimensional sum-of-squares optimization. Moreover, the estimator is shown to be asymptotically and statistically consistent. Finally, we validate the theoretical results and illustrate the performance of our method with numerical experiments. In addition, the robustness and generalizability performance of the proposed IOC algorithm are also illustrated.
