Statistically consistent inverse optimal control for discrete-time indefinite linear-quadratic systems

Han Zhang; Axel Ringh

Statistically consistent inverse optimal control for discrete-time indefinite linear-quadratic systems

Han Zhang, Axel Ringh

TL;DR

This work tackles inverse optimal control for discrete-time finite-horizon indefinite linear-quadratic systems with random horizons and stochastic disturbances. It develops necessary and sufficient conditions for the forward problem's solvability, proves identifiability of the IOC mapping, and constructs a convex, statistically consistent estimator that recovers the true cost parameters from noisy demonstrations. By formulating a relaxed HJB-violation objective and leveraging empirical averages, the approach yields a globally optimal solution for $(Q,q)$ and is backed by a convergence theory as the number of demonstrations grows. Numerical experiments on moderate-scale systems and a nonzero-sum pursuit–evasion game demonstrate the method's practicality, scalability, and data-driven capability to uncover underlying cost structures in complex LQR settings with indefinite costs and variable horizons.

Abstract

The Inverse Optimal Control (IOC) problem is a structured system identification problem that aims to identify the underlying objective function based on observed optimal trajectories. This provides a data-driven way to model experts' behavior. In this paper, we consider the case of discrete-time finite-horizon linear-quadratic problems where: the quadratic cost term in the objective is not necessarily positive semi-definite; the planning horizon is a random variable; we have both process noise and observation noise; the dynamics can have a drift term; and where we can have a linear cost term in the objective. In this setting, we first formulate the necessary and sufficient conditions for when the forward optimal control problem is solvable. Next, we show that the corresponding IOC problem is identifiable. Using the conditions for existence of an optimum of the forward problem, we then formulate an estimator for the parameters in the objective function of the forward problem as the globally optimal solution to a convex optimization problem, and prove that the estimator is statistical consistent. Finally, the performance of the algorithm is demonstrated on two numerical examples.

Statistically consistent inverse optimal control for discrete-time indefinite linear-quadratic systems

TL;DR

and is backed by a convergence theory as the number of demonstrations grows. Numerical experiments on moderate-scale systems and a nonzero-sum pursuit–evasion game demonstrate the method's practicality, scalability, and data-driven capability to uncover underlying cost structures in complex LQR settings with indefinite costs and variable horizons.

Abstract

Paper Structure (15 sections, 12 theorems, 75 equations, 1 figure)

This paper contains 15 sections, 12 theorems, 75 equations, 1 figure.

Introduction
Problem formulation
Forward problem analysis
Necessary and sufficient conditions for existence of optimal control
Analysis of the closed-loop system matrices
Identifiability analysis and persistent excitation
The IOC algorithm
Construction and empirical approximation
Statistical consistency analysis
On implementation and the computational complexity of the estimator
Numerical examples
Demonstration of performance for a system with both modest size and planning horizon
Identification of cost in non-zero sum pursuit-evasion game
Conclusion
Deferred proofs

Key Result

Proposition 2.5

Under Assumptions ass:controlability_and_full_rank and ass:IID, if the optimal control problem eq:stochastic_forward_problem with the objective function given by $(\bar{Q}, \bar{q}, \bar{R})$ admits a solution for planning horizon $N = \nu$ for any $\bar{x}\in\mathbb{R}^n$, then it admits a solution

Figures (1)

Figure 1: Log-log plot of the mean and standard deviation of the relative error of $Q_{est}$ as a function of the number of trajectories. The estimates are obtained using noisy data, as described in Section \ref{['subsec:pursuit-evasion']}.

Theorems & Definitions (16)

Remark 2.1
Proposition 2.5
Theorem 3.1: Boundedness of forward problem
Remark 3.2
Remark 3.3
Proposition 3.4
Corollary 3.5
Remark 3.6
Corollary 3.7
Proposition 4.1: Identifiability
...and 6 more

Statistically consistent inverse optimal control for discrete-time indefinite linear-quadratic systems

TL;DR

Abstract

Statistically consistent inverse optimal control for discrete-time indefinite linear-quadratic systems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (16)