Table of Contents
Fetching ...

Online Learning for Nonlinear Dynamical Systems without the I.I.D. Condition

Lantian Zhang, Silun Zhang

TL;DR

The paper addresses online identification and prediction for nonlinear stochastic dynamical systems from a single trajectory where data are non-i.i.d. due to closed-loop feedback. It introduces a novel online projected Newton-type estimator and an online predictor, and proves, using stochastic Lyapunov arguments and martingale techniques, that the average predictor regret $R_T$ satisfies $R_T=o(T)$ without PE, along with an almost-sure convergence guarantee for the parameter estimates under a new excitation condition weaker than traditional PE. The main theoretical contributions include a global convergence result for online parameter estimation and a logarithmic regret bound $R_T=O(\log T)$ when the outputs are bounded, plus demonstrations that the excitation condition applies to broader trajectories. The approach is supported by numerical experiments on nonlinear stochastic systems showing zero average regret in various exploration regimes and confirming convergence behavior beyond the PE regime, thereby enabling robust online learning in adaptive control and nonlinear observers with minimal data assumptions.

Abstract

This paper investigates online identification and prediction for nonlinear stochastic dynamical systems. In contrast to offline learning methods, we develop online algorithms that learn unknown parameters from a single trajectory. A key challenge in this setting is handling the non-independent data generated by the closed-loop system. Existing theoretical guarantees for such systems are mostly restricted to the assumption that inputs are independently and identically distributed (i.i.d.), or that the closed-loop data satisfy a persistent excitation (PE) condition. However, these assumptions are often violated in applications such as adaptive feedback control. In this paper, we propose an online projected Newton-type algorithm for parameter estimation in nonlinear stochastic dynamical systems, and develop an online predictor for system outputs based on online parameter estimates. By using both the stochastic Lyapunov function and martingale estimation methods, we demonstrate that the average regret converges to zero without requiring traditional persistent excitation (PE) conditions. Furthermore, we establish a novel excitation condition that ensures global convergence of the online parameter estimates. The proposed excitation condition is applicable to a broader class of system trajectories, including those violating the PE condition.

Online Learning for Nonlinear Dynamical Systems without the I.I.D. Condition

TL;DR

The paper addresses online identification and prediction for nonlinear stochastic dynamical systems from a single trajectory where data are non-i.i.d. due to closed-loop feedback. It introduces a novel online projected Newton-type estimator and an online predictor, and proves, using stochastic Lyapunov arguments and martingale techniques, that the average predictor regret satisfies without PE, along with an almost-sure convergence guarantee for the parameter estimates under a new excitation condition weaker than traditional PE. The main theoretical contributions include a global convergence result for online parameter estimation and a logarithmic regret bound when the outputs are bounded, plus demonstrations that the excitation condition applies to broader trajectories. The approach is supported by numerical experiments on nonlinear stochastic systems showing zero average regret in various exploration regimes and confirming convergence behavior beyond the PE regime, thereby enabling robust online learning in adaptive control and nonlinear observers with minimal data assumptions.

Abstract

This paper investigates online identification and prediction for nonlinear stochastic dynamical systems. In contrast to offline learning methods, we develop online algorithms that learn unknown parameters from a single trajectory. A key challenge in this setting is handling the non-independent data generated by the closed-loop system. Existing theoretical guarantees for such systems are mostly restricted to the assumption that inputs are independently and identically distributed (i.i.d.), or that the closed-loop data satisfy a persistent excitation (PE) condition. However, these assumptions are often violated in applications such as adaptive feedback control. In this paper, we propose an online projected Newton-type algorithm for parameter estimation in nonlinear stochastic dynamical systems, and develop an online predictor for system outputs based on online parameter estimates. By using both the stochastic Lyapunov function and martingale estimation methods, we demonstrate that the average regret converges to zero without requiring traditional persistent excitation (PE) conditions. Furthermore, we establish a novel excitation condition that ensures global convergence of the online parameter estimates. The proposed excitation condition is applicable to a broader class of system trajectories, including those violating the PE condition.

Paper Structure

This paper contains 6 sections, 6 theorems, 83 equations, 2 figures, 1 algorithm.

Key Result

Theorem 1

Under Assumptions assum1-assum4, the sample paths of the accumulated regrets will have the following property: where $R_{T}$ is defined in (eqre). Moreover, if the system output sequence $\{y_{t}, t\geq 0\}$ is bounded, then we have

Figures (2)

  • Figure 1: Trajectories of $\frac{1}{t}R_{t}$
  • Figure 2: Trajectories of $\|\theta^{*}-\hat{\theta}_{t}\|^{2}$

Theorems & Definitions (18)

  • Definition 1: $\rho-$stable
  • Remark 1
  • Example 1: Linear dynamics
  • Example 2: RNN dynamics
  • Example 3: Binary-valued dynamics
  • Theorem 1
  • Remark 2
  • Theorem 2
  • Remark 3
  • Lemma 1
  • ...and 8 more