Online Coreset Selection for Learning Dynamic Systems
Jingyuan Li, Dawei Shi, Ling Shi
TL;DR
This work tackles online data-efficient learning for dynamic systems under bounded disturbances by developing an online coreset selection framework within a set-membership identification (SMI) paradigm. It introduces a stacked polyhedral over-approximation of the feasible-parameter set (FPS) and a threshold-based, geometry-driven coreset trigger that guarantees contraction of uncertainty under persistent excitation, with an explicit bound on the worst-case FPS volume $\mu_{\infty}$ driven by $C(\alpha_0,n_z)$ and the trigger counts. The analysis also yields a Hausdorff-distance bound under disturbance-bound mismatch and an upper bound on the expected coreset size, plus extensions to nonlinear-in-the-parameters models and to noisy measurements, all validated by simulations. The proposed approach provides deterministic, worst-case guarantees for online data reduction in DDC, offering a practical pathway to real-time, data-efficient robust learning and control in dynamic environments.
Abstract
With the increasing availability of streaming data in dynamic systems, a critical challenge in data-driven modeling for control is how to efficiently select informative data to characterize system dynamics. In this work, we develop an online coreset selection method for set-membership identification in the presence of process disturbances, improving data efficiency while preserving convergence guarantees. Specifically, we derive a stacked polyhedral representation that over-approximates the feasible parameter set. Based on this representation, we propose a geometric selection criterion that retains a data point only if it induces a sufficient contraction of the feasible set. Theoretically, the feasible-set volume is shown to converge to zero almost surely under persistently exciting data and a tight disturbance bound. When the disturbance bound is mismatched, an explicit Hausdorff-distance bound is derived to quantify the resulting identification error. In addition, an upper bound on the expected coreset size is established and extensions to nonlinear systems with linear-in-the-parameter structures and to bounded measurement noise are discussed. The effectiveness of the proposed method is demonstrated through comprehensive simulation studies.
