Table of Contents
Fetching ...

Towards a data-scale independent regulariser for robust sparse identification of non-linear dynamics

Jay Raut, Daniel N. Wilke, Stephan Schmidt

Abstract

Data normalisation, a common and often necessary preprocessing step in engineering and scientific applications, can severely distort the discovery of governing equations by magnitudebased sparse regression methods. This issue is particularly acute for the Sparse Identification of Nonlinear Dynamics (SINDy) framework, where the core assumption of sparsity is undermined by the interaction between data scaling and measurement noise. The resulting discovered models can be dense, uninterpretable, and physically incorrect. To address this critical vulnerability, we introduce the Sequential Thresholding of Coefficient of Variation (STCV), a novel, computationally efficient sparse regression algorithm that is inherently robust to data scaling. STCV replaces conventional magnitude-based thresholding with a dimensionless statistical metric, the Coefficient Presence (CP), which assesses the statistical validity and consistency of candidate terms in the model library. This shift from magnitude to statistical significance makes the discovery process invariant to arbitrary data scaling. Through comprehensive benchmarking on canonical dynamical systems and practical engineering problems, including a physical mass-spring-damper experiment, we demonstrate that STCV consistently and significantly outperforms standard Sequential Thresholding Least Squares (STLSQ) and Ensemble-SINDy (E-SINDy) on normalised, noisy datasets. The results show that STCV-based methods can successfully identify the correct, sparse physical laws even when other methods fail. By mitigating the distorting effects of normalisation, STCV makes sparse system identification a more reliable and automated tool for real-world applications, thereby enhancing model interpretability and trustworthiness.

Towards a data-scale independent regulariser for robust sparse identification of non-linear dynamics

Abstract

Data normalisation, a common and often necessary preprocessing step in engineering and scientific applications, can severely distort the discovery of governing equations by magnitudebased sparse regression methods. This issue is particularly acute for the Sparse Identification of Nonlinear Dynamics (SINDy) framework, where the core assumption of sparsity is undermined by the interaction between data scaling and measurement noise. The resulting discovered models can be dense, uninterpretable, and physically incorrect. To address this critical vulnerability, we introduce the Sequential Thresholding of Coefficient of Variation (STCV), a novel, computationally efficient sparse regression algorithm that is inherently robust to data scaling. STCV replaces conventional magnitude-based thresholding with a dimensionless statistical metric, the Coefficient Presence (CP), which assesses the statistical validity and consistency of candidate terms in the model library. This shift from magnitude to statistical significance makes the discovery process invariant to arbitrary data scaling. Through comprehensive benchmarking on canonical dynamical systems and practical engineering problems, including a physical mass-spring-damper experiment, we demonstrate that STCV consistently and significantly outperforms standard Sequential Thresholding Least Squares (STLSQ) and Ensemble-SINDy (E-SINDy) on normalised, noisy datasets. The results show that STCV-based methods can successfully identify the correct, sparse physical laws even when other methods fail. By mitigating the distorting effects of normalisation, STCV makes sparse system identification a more reliable and automated tool for real-world applications, thereby enhancing model interpretability and trustworthiness.
Paper Structure (18 sections, 6 equations, 9 figures, 5 tables)

This paper contains 18 sections, 6 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Experiment demonstrating the degradation of SINDy sampling requirements for the Lorenz system in Equation \ref{['eqn:Lorentz1']} when data is unscaled and normalised. The top and bottom rows respectively show noiseless data and data with 0.1% uniformly distributed noise added. The left and right columns, respectively, show the results of SINDy on unnormalised and normalised data. White indicates a 100% success rate in identifying the correct model sparsity, while black indicates a 0% success rate.
  • Figure 2: Success rate results from the Lorenz, Rössler, Van der Pol, and Duffing oscillator models on the Raw Data (RD) and the Scaled Data (SD). The model form and the parameters used may be found in Appendix \ref{['app:dataGen']}.
  • Figure 3: The success rate of various sparse regression algorithms for SINDy on noisy (scaled only, as unscaled cannot be modelled due to numerical limitations) linear oscillator data representing a damaged bearing housing during operation.
  • Figure 4: The success rate of various sparse regression algorithms for SINDy on noisy data from the simulated linear and nonlinear half-car models
  • Figure 5: Experimental setup of physical linear/nonlinear mass-spring-damper system. Component list: 1) IMU wiring guide rod, 2) Linear springs, 3) Guide rod for oscillating mass, 4) Nonlinear magnetic springs simulated by stacked magnets on guide rods, 5) Rods securing end plates, 6) End plates holding springs and guide rod, 7) IMU glued to mass, 8) Oscillating mass, 9) Arduino logger.
  • ...and 4 more figures