Online Control-Informed Learning

Zihao Liang; Tianyu Zhou; Zehui Lu; Shaoshuai Mou

Online Control-Informed Learning

Zihao Liang, Tianyu Zhou, Zehui Lu, Shaoshuai Mou

TL;DR

This work addresses online learning for autonomous systems modeled as unknown-parameter optimal control problems. It introduces Online Control-Informed Learning (OCIL), which pairs an EKF-based online parameter estimator with a gradient generator derived from Pontryagin differential programming to update the unknown parameters $\boldsymbol{\theta}$ as streaming measurements $\boldsymbol{O}_t$ arrive. The framework supports Online Imitation Learning, Online System Identification, and Policy Tuning On-the-fly, and provides a convergence analysis showing local asymptotic convergence under standard observability and noise assumptions. Empirical results across Cartpole, Quadrotor, and Rocket demonstrate faster online convergence, improved robustness to measurement noise, and competitive online/offline performance compared to state-of-the-art baselines, all while running on CPU. OCIL thus offers a data-efficient, online, control-informed route to learning-enabled robotics with practical applicability in real-time settings.

Abstract

This paper proposes an Online Control-Informed Learning (OCIL) framework, which employs the well-established optimal control and state estimation techniques in the field of control to solve a broad class of learning tasks in an online fashion. This novel integration effectively handles practical issues in machine learning such as noisy measurement data, online learning, and data efficiency. By considering any robot as a tunable optimal control system, we propose an online parameter estimator based on extended Kalman filter (EKF) to incrementally tune the system in an online fashion, enabling it to complete designated learning or control tasks. The proposed method also improves the robustness in learning by effectively managing noise in the data. Theoretical analysis is provided to demonstrate the convergence of OCIL. Three learning modes of OCIL, i.e. Online Imitation Learning, Online System Identification, and Policy Tuning On-the-fly, are investigated via experiments, which validate their effectiveness.

Online Control-Informed Learning

TL;DR

as streaming measurements

arrive. The framework supports Online Imitation Learning, Online System Identification, and Policy Tuning On-the-fly, and provides a convergence analysis showing local asymptotic convergence under standard observability and noise assumptions. Empirical results across Cartpole, Quadrotor, and Rocket demonstrate faster online convergence, improved robustness to measurement noise, and competitive online/offline performance compared to state-of-the-art baselines, all while running on CPU. OCIL thus offers a data-efficient, online, control-informed route to learning-enabled robotics with practical applicability in real-time settings.

Abstract

Paper Structure (20 sections, 3 theorems, 51 equations, 9 figures, 4 tables)

This paper contains 20 sections, 3 theorems, 51 equations, 9 figures, 4 tables.

Introduction
Related Work
Contributions
Problem Formulation
Main Results
Online Parameter Estimator
Gradient Generator
OCIL Framework
Convergence Analysis
Applications to Different Online Learning Modes and Experiments
Online Computational Performance
Limitations.
Conclusions
Proof of Lemma \ref{['lemma:convergenceKalman']}
Proof of Theorem \ref{['theorem']}
...and 5 more sections

Key Result

Lemma 1

jin2020pontryagin Define the Jacobian and Hessian matrices related to $\boldsymbol{\xi}(\boldsymbol{\theta})$ as: If $\boldsymbol{H}_{t}^{uu}$ is invertible for all $t=0,\cdots,T-1$, the following recursions from $t=T$ to $t=0$ hold: with $\boldsymbol{V}_{T}=\boldsymbol{H}^{xx}_{T}$ and $\boldsymbol{W}_{T}=\boldsymbol{H}^{x\theta}_{T}$. Here, $\boldsymbol{A}_{t}=\boldsymbol{F}_{t}-\boldsymbol{G}

Figures (9)

Figure 1: Framework of Online Control-Informed Learning.
Figure 2: Imitation loss v.s. number of data points
Figure 3: Trajectories of the cartpole in online imitation learning. Blue solid lines: learned trajectory. Green solid lines: observed noisy trajectory. Red dashed lines: ground truth.
Figure 4: Trajectories of the quadrotor in online imitation learning. Blue solid lines: learned trajectory. Green solid lines: observed noisy trajectory. Red dashed lines: ground truth.
Figure 5: SysID loss v.s. number of data points
...and 4 more figures

Theorems & Definitions (8)

Remark 1
Lemma 1
Remark 2
Remark 3
Lemma 2
Remark 4
Remark 5
Theorem 1

Online Control-Informed Learning

TL;DR

Abstract

Online Control-Informed Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (8)