Table of Contents
Fetching ...

Online Coreset Selection for Learning Dynamic Systems

Jingyuan Li, Dawei Shi, Ling Shi

TL;DR

This work tackles online data-efficient learning for dynamic systems under bounded disturbances by developing an online coreset selection framework within a set-membership identification (SMI) paradigm. It introduces a stacked polyhedral over-approximation of the feasible-parameter set (FPS) and a threshold-based, geometry-driven coreset trigger that guarantees contraction of uncertainty under persistent excitation, with an explicit bound on the worst-case FPS volume $\mu_{\infty}$ driven by $C(\alpha_0,n_z)$ and the trigger counts. The analysis also yields a Hausdorff-distance bound under disturbance-bound mismatch and an upper bound on the expected coreset size, plus extensions to nonlinear-in-the-parameters models and to noisy measurements, all validated by simulations. The proposed approach provides deterministic, worst-case guarantees for online data reduction in DDC, offering a practical pathway to real-time, data-efficient robust learning and control in dynamic environments.

Abstract

With the increasing availability of streaming data in dynamic systems, a critical challenge in data-driven modeling for control is how to efficiently select informative data to characterize system dynamics. In this work, we develop an online coreset selection method for set-membership identification in the presence of process disturbances, improving data efficiency while preserving convergence guarantees. Specifically, we derive a stacked polyhedral representation that over-approximates the feasible parameter set. Based on this representation, we propose a geometric selection criterion that retains a data point only if it induces a sufficient contraction of the feasible set. Theoretically, the feasible-set volume is shown to converge to zero almost surely under persistently exciting data and a tight disturbance bound. When the disturbance bound is mismatched, an explicit Hausdorff-distance bound is derived to quantify the resulting identification error. In addition, an upper bound on the expected coreset size is established and extensions to nonlinear systems with linear-in-the-parameter structures and to bounded measurement noise are discussed. The effectiveness of the proposed method is demonstrated through comprehensive simulation studies.

Online Coreset Selection for Learning Dynamic Systems

TL;DR

This work tackles online data-efficient learning for dynamic systems under bounded disturbances by developing an online coreset selection framework within a set-membership identification (SMI) paradigm. It introduces a stacked polyhedral over-approximation of the feasible-parameter set (FPS) and a threshold-based, geometry-driven coreset trigger that guarantees contraction of uncertainty under persistent excitation, with an explicit bound on the worst-case FPS volume driven by and the trigger counts. The analysis also yields a Hausdorff-distance bound under disturbance-bound mismatch and an upper bound on the expected coreset size, plus extensions to nonlinear-in-the-parameters models and to noisy measurements, all validated by simulations. The proposed approach provides deterministic, worst-case guarantees for online data reduction in DDC, offering a practical pathway to real-time, data-efficient robust learning and control in dynamic environments.

Abstract

With the increasing availability of streaming data in dynamic systems, a critical challenge in data-driven modeling for control is how to efficiently select informative data to characterize system dynamics. In this work, we develop an online coreset selection method for set-membership identification in the presence of process disturbances, improving data efficiency while preserving convergence guarantees. Specifically, we derive a stacked polyhedral representation that over-approximates the feasible parameter set. Based on this representation, we propose a geometric selection criterion that retains a data point only if it induces a sufficient contraction of the feasible set. Theoretically, the feasible-set volume is shown to converge to zero almost surely under persistently exciting data and a tight disturbance bound. When the disturbance bound is mismatched, an explicit Hausdorff-distance bound is derived to quantify the resulting identification error. In addition, an upper bound on the expected coreset size is established and extensions to nonlinear systems with linear-in-the-parameter structures and to bounded measurement noise are discussed. The effectiveness of the proposed method is demonstrated through comprehensive simulation studies.

Paper Structure

This paper contains 25 sections, 17 theorems, 93 equations, 6 figures, 1 algorithm.

Key Result

Proposition 1

Consider system (eq:system) with full dataset $\mathcal{D}_{[K]}$. Suppose Assumptions as:disturbance and as:initialSet hold. Then, the stacked polyhedral set satisfies where $\Theta_0$ and $\Theta_K$ are defined in (eq:Theta0) and (eq:SME_Theta*), respectively.

Figures (6)

  • Figure 1: Schematic diagram of the proposed online learning framework.
  • Figure 2: Verification of PE condition in Assumption \ref{['as:PE']}: minimum eigenvalue of the covariance matrix over sliding windows of length $N_u=20$.
  • Figure 3: Performance of the threshold-triggered estimator ($\alpha_0 = -0.3$). (a) Evolution of the worst-case volume $\mu_{\infty}(\widehat{\Theta}_k)$. (b) Cumulative number of selected data points.
  • Figure 4: Evolution of feasible parameter sets. The top and bottom rows depict the 3D polyhedral uncertainty sets $P^1_k$ and $P^2_k$ at $k = 0$, $k = 20$, and $k = 80$, respectively. Red stars mark the true parameter values.
  • Figure 5: Comparison of performance under different threshold parameter $\alpha_0 \in \{-1,-0.5,-0.3,-0.2,0\}$. (a) Evolution of feasible set worst-case volume $\mu_{\infty}(\widehat{\Theta}_k)$. (b) Cumulative number of selected data points under different $\alpha_0$ values. The shaded regions represent the range between the minimum and maximum values.
  • ...and 1 more figures

Theorems & Definitions (46)

  • Remark 1
  • Proposition 1
  • proof
  • Remark 2
  • Proposition 2
  • Remark 3
  • Definition 1: Persistent excitation
  • Definition 2: Tightness of the bound
  • Remark 4
  • Lemma 1
  • ...and 36 more