Stochastic Online Optimization for Cyber-Physical and Robotic Systems

Hao Ma; Melanie Zeilinger; Michael Muehlebach

Stochastic Online Optimization for Cyber-Physical and Robotic Systems

Hao Ma, Melanie Zeilinger, Michael Muehlebach

TL;DR

The paper addresses online control for cyber-physical and robotic systems with nonlinear, partially observed dynamics by developing a gradient-based online optimization framework that can utilize approximate dynamics as prior knowledge. It provides a unified non-convex convergence analysis for both online gradient descent and an online quasi-Newton method realized via a trust-region-like approach, and quantifies how modeling error $\kappa$ degrades convergence. The framework is validated through simulations on a cantilever beam and a four-legged robot and real-world experiments with a table-tennis robot, demonstrating fast convergence and robustness to modeling errors. A key theoretical contribution is decoupling the approximate Hessian from past randomness and deriving regret bounds that remain sub-linear under smoothness and variance assumptions, with practical implications for online adaptation in CPS and robotics. The work bridges theory and practice by showing how approximate gradients and a trust-region interpretation enable effective online learning for complex, nonlinear, partially observable robotic systems, enabling continuous improvement and online adaptation in deployment.

Abstract

We propose a novel gradient-based online optimization framework for solving stochastic programming problems that frequently arise in the context of cyber-physical and robotic systems. Our problem formulation accommodates constraints that model the evolution of a cyber-physical system, which has, in general, a continuous state and action space, is nonlinear, and where the state is only partially observed. We also incorporate an approximate model of the dynamics as prior knowledge into the learning process and show that even rough estimates of the dynamics can significantly improve the convergence of our algorithms. Our online optimization framework encompasses both gradient descent and quasi-Newton methods, and we provide a unified convergence analysis of our algorithms in a non-convex setting. We also characterize the impact of modeling errors in the system dynamics on the convergence rate of the algorithms. Finally, we evaluate our algorithms in simulations of a flexible beam, a four-legged walking robot, and in real-world experiments with a ping-pong playing robot.

Stochastic Online Optimization for Cyber-Physical and Robotic Systems

TL;DR

degrades convergence. The framework is validated through simulations on a cantilever beam and a four-legged robot and real-world experiments with a table-tennis robot, demonstrating fast convergence and robustness to modeling errors. A key theoretical contribution is decoupling the approximate Hessian from past randomness and deriving regret bounds that remain sub-linear under smoothness and variance assumptions, with practical implications for online adaptation in CPS and robotics. The work bridges theory and practice by showing how approximate gradients and a trust-region interpretation enable effective online learning for complex, nonlinear, partially observable robotic systems, enabling continuous improvement and online adaptation in deployment.

Abstract

Paper Structure (30 sections, 6 theorems, 90 equations, 17 figures, 5 tables, 2 algorithms)

This paper contains 30 sections, 6 theorems, 90 equations, 17 figures, 5 tables, 2 algorithms.

Introduction
Motivation
Online Learning
Related Work
Contribution
Structure
Problem Setting: Stochastic Online Learning
Interpretation of Algorithm \ref{['algo:online_quasi_newton']} as Trust-Region Approach
Connection to Cyber-Physical Systems and Robotics
Experiments
Cantilever Beam
Reference Trajectory Distribution
Gradient Estimation
Network and Input Structure
Experiments
...and 15 more sections

Key Result

Theorem 2.1

Let the loss functions $f\left(\cdot; \zeta\right):\Omega \rightarrow \mathbb{R}$ satisfy Assumption asp:smoothness and Assumption asp:bounded_variance, and let the pseudo-Hessian $A_t$ satisfy Assumption asp:bounded_hessian. Let the estimate $\mathcal{G}\left(u_t\right)$ satisfy Assumption asp:mode Then the following inequality holds: where $\omega^{\star}:=\mathop{\mathrm{arg\,min}}\limits_{\om

Figures (17)

Figure 1: This figure illustrates the geometric meaning of the modeling error modulus $\kappa$ in two-dimensional space. The expectation of the gradient estimate $\mathbb{E}_{\zeta} \left[\left. \mathcal{F}\left(\omega_t;\zeta\right)\right| \omega_t \right]$ lies within the open ball with center $\mathbb{E}_{\zeta} \left[\left. \nabla f\left(\omega_t;\zeta\right)\right| \omega_t \right]$ and radius $\left|\mathbb{E}_{\zeta} \left[\left. \nabla f\left(\omega_t;\zeta\right)\right| \omega_t \right] \right|/\sqrt{\lambda}$.
Figure 2: The figure shows the classical two-degrees-of-freedom control framework in panel (a), which includes a feedforward controller and a feedback controller, and a pure feedforward control framework in panel (b). The variable $n_{\text{d}}$ denotes a disturbance, which will subsequently be used to obtain an approximate gradient $\mathcal{G}\left(u_t\right)$.
Figure 3: Deformation of the cantilever beam under the active torque and an external disturbance $d$, where the dashed line represents the position of the cantilever beam when at rest.
Figure 4: The figure depicts the discrete model of the cantilever beam obtained using the lumped-parameter method. The entire beam is decomposed into $n$ rigid units, with adjacent units interconnected by a pair of spring and damper. Each rigid unit has a length of $l_i$, and the angle it makes with the horizontal plane is denoted by $\alpha_i$. The active torque is applied only to the first rigid unit hinged to the wall. The gray object indicates the position of the beam when at rest.
Figure 5: The figure illustrates the range of reference trajectories used for training, where the gray lines are composed of $400.0$ randomly sampled reference trajectories. The red dashed boxes indicate the spatial and temporal distribution range of the points $y_a$ and $y_b$, respectively.
...and 12 more figures

Theorems & Definitions (13)

Theorem 2.1
Corollary 2.1
proof
Definition 2.1
Lemma 2.1
proof
Corollary 2.2
proof
Lemma 1.1
proof
...and 3 more

Stochastic Online Optimization for Cyber-Physical and Robotic Systems

TL;DR

Abstract

Stochastic Online Optimization for Cyber-Physical and Robotic Systems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (17)

Theorems & Definitions (13)