Online Control-Informed Learning
Zihao Liang, Tianyu Zhou, Zehui Lu, Shaoshuai Mou
TL;DR
This work addresses online learning for autonomous systems modeled as unknown-parameter optimal control problems. It introduces Online Control-Informed Learning (OCIL), which pairs an EKF-based online parameter estimator with a gradient generator derived from Pontryagin differential programming to update the unknown parameters $\boldsymbol{\theta}$ as streaming measurements $\boldsymbol{O}_t$ arrive. The framework supports Online Imitation Learning, Online System Identification, and Policy Tuning On-the-fly, and provides a convergence analysis showing local asymptotic convergence under standard observability and noise assumptions. Empirical results across Cartpole, Quadrotor, and Rocket demonstrate faster online convergence, improved robustness to measurement noise, and competitive online/offline performance compared to state-of-the-art baselines, all while running on CPU. OCIL thus offers a data-efficient, online, control-informed route to learning-enabled robotics with practical applicability in real-time settings.
Abstract
This paper proposes an Online Control-Informed Learning (OCIL) framework, which employs the well-established optimal control and state estimation techniques in the field of control to solve a broad class of learning tasks in an online fashion. This novel integration effectively handles practical issues in machine learning such as noisy measurement data, online learning, and data efficiency. By considering any robot as a tunable optimal control system, we propose an online parameter estimator based on extended Kalman filter (EKF) to incrementally tune the system in an online fashion, enabling it to complete designated learning or control tasks. The proposed method also improves the robustness in learning by effectively managing noise in the data. Theoretical analysis is provided to demonstrate the convergence of OCIL. Three learning modes of OCIL, i.e. Online Imitation Learning, Online System Identification, and Policy Tuning On-the-fly, are investigated via experiments, which validate their effectiveness.
