3DIOC: Direct Data-Driven Inverse Optimal Control for LTI Systems

Chendi Qu; Jianping He; Xiaoming Duan

3DIOC: Direct Data-Driven Inverse Optimal Control for LTI Systems

Chendi Qu, Jianping He, Xiaoming Duan

TL;DR

This work tackles inverse optimal control for linear time-invariant systems by learning the finite-horizon LQ objective weights $Q$ and $R$ directly from input-output trajectories, without identifying the system dynamics. It builds a data-enabled, model-free IOC framework using the Fundamental Lemma to obtain an input-output representation and derives model-free KKT conditions that connect data blocks to the unknown $Q$ and $R$. The authors propose Vanilla and Simplified 3DIOC formulations with identifiability and perturbation analyses, including a special LQR-IOC case, and demonstrate computational efficiency and robustness through simulations. The approach reduces data and computation while providing guarantees on identifiability and sensitivity to noise, with potential extensions to process-noise and Koopman-based nonlinear settings.

Abstract

This paper develops a direct data-driven inverse optimal control (3DIOC) algorithm for the linear time-invariant (LTI) system who conducts a linear quadratic (LQ) control, where the underlying objective function is learned directly from measured input-output trajectories without system identification. By introducing the Fundamental Lemma, we establish the input-output representation of the LTI system. We accordingly propose a model-free optimality necessary condition for the forward LQ problem to build a connection between the objective function and collected data, with which the inverse optimal control problem is solved. We further improve the algorithm so that it requires a less computation and data. Identifiability condition and perturbation analysis are provided. Simulations demonstrate the efficiency and performance of our algorithms.

3DIOC: Direct Data-Driven Inverse Optimal Control for LTI Systems

TL;DR

This work tackles inverse optimal control for linear time-invariant systems by learning the finite-horizon LQ objective weights

and

directly from input-output trajectories, without identifying the system dynamics. It builds a data-enabled, model-free IOC framework using the Fundamental Lemma to obtain an input-output representation and derives model-free KKT conditions that connect data blocks to the unknown

and

. The authors propose Vanilla and Simplified 3DIOC formulations with identifiability and perturbation analyses, including a special LQR-IOC case, and demonstrate computational efficiency and robustness through simulations. The approach reduces data and computation while providing guarantees on identifiability and sensitivity to noise, with potential extensions to process-noise and Koopman-based nonlinear settings.

Abstract

Paper Structure (14 sections, 11 theorems, 54 equations, 3 figures, 1 table)

This paper contains 14 sections, 11 theorems, 54 equations, 3 figures, 1 table.

Introduction
Preliminaries and Problem Formulation
Problem Description
Fundamental Lemma
Direct Data-Driven IOC
Model-free KKT Condition
Vanilla 3DIOC
Simplified 3DIOC without Redundant Variables
Special Case for LQR-IOC
Simulations
conclusion
Proof of Theorem \ref{['iden_th']}
Proof of Theorem \ref{['s-iden_th']}
Proof of Theorem \ref{['per-ana']}

Key Result

Lemma 1

(Fundamental Lemma willems2005note) Consider the linear time-invariant system $\mathscr{B}$ in sys. If a) system $\mathscr{B}$ is controllable, b) $w^d \in \mathscr{B}|_T$, c) the inputs $u^d$ is persistently exciting of order $L+n$, then we have where

Figures (3)

Figure 1: The flow chart of proposed problems.
Figure 2: Estimation errors to $Q,R$ without noises. At each $T_{ini}$ we conduct the simulation for $15$ times. The blue scatters represent the estimation error at each experiment. The purple line is the mean error.
Figure 3: Estimation errors with the presence of observation noises. We conduct the simulation $15$ times for each variance. The orange scatters are errors at each experiment. The solid red curve represents mean of the error and the dashed line for variance.

Theorems & Definitions (18)

Definition 1
Lemma 1
Lemma 2
Lemma 3
Proposition 1
Proposition 2
Lemma 4: Model-free KKT
Definition 2: Scalar Ambiguity
Theorem 1: Identifiability
proof
...and 8 more

3DIOC: Direct Data-Driven Inverse Optimal Control for LTI Systems

TL;DR

Abstract

3DIOC: Direct Data-Driven Inverse Optimal Control for LTI Systems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (18)