Stochastic Online Optimization for Cyber-Physical and Robotic Systems
Hao Ma, Melanie Zeilinger, Michael Muehlebach
TL;DR
The paper addresses online control for cyber-physical and robotic systems with nonlinear, partially observed dynamics by developing a gradient-based online optimization framework that can utilize approximate dynamics as prior knowledge. It provides a unified non-convex convergence analysis for both online gradient descent and an online quasi-Newton method realized via a trust-region-like approach, and quantifies how modeling error $\kappa$ degrades convergence. The framework is validated through simulations on a cantilever beam and a four-legged robot and real-world experiments with a table-tennis robot, demonstrating fast convergence and robustness to modeling errors. A key theoretical contribution is decoupling the approximate Hessian from past randomness and deriving regret bounds that remain sub-linear under smoothness and variance assumptions, with practical implications for online adaptation in CPS and robotics. The work bridges theory and practice by showing how approximate gradients and a trust-region interpretation enable effective online learning for complex, nonlinear, partially observable robotic systems, enabling continuous improvement and online adaptation in deployment.
Abstract
We propose a novel gradient-based online optimization framework for solving stochastic programming problems that frequently arise in the context of cyber-physical and robotic systems. Our problem formulation accommodates constraints that model the evolution of a cyber-physical system, which has, in general, a continuous state and action space, is nonlinear, and where the state is only partially observed. We also incorporate an approximate model of the dynamics as prior knowledge into the learning process and show that even rough estimates of the dynamics can significantly improve the convergence of our algorithms. Our online optimization framework encompasses both gradient descent and quasi-Newton methods, and we provide a unified convergence analysis of our algorithms in a non-convex setting. We also characterize the impact of modeling errors in the system dynamics on the convergence rate of the algorithms. Finally, we evaluate our algorithms in simulations of a flexible beam, a four-legged walking robot, and in real-world experiments with a ping-pong playing robot.
