Table of Contents
Fetching ...

Scalable Sparse Regression for Model Discovery: The Fast Lane to Insight

Matthew Golden

TL;DR

The paper addresses scalable discovery of governing equations from data when symbolic libraries are large and coefficients may be very small. It introduces SPRINT, a fast, rank-1 update–driven extension of exhaustive sparse regression that uses bisection to identify optimal column modifications in the library matrix ${\bf G}$, with two modes: ${\rm SPRINT-}$ (removal) and ${\rm SPRINT+}$ (addition). The method relies on updates to the smallest singular value via secular functions after rank-1 changes, yielding empirical computational scaling of roughly $O(|\mathcal{L}|^{3.38})$ for SPRINT-- and $O(|\mathcal{L}|^{1.65})$ for SPRINT+, and a simple, largely hyperparameter-free model selection rule. Demonstrations on Kuramoto–Sivashinsky and MHD-like library sizes show that SPRINT+ can recover tiny coefficients (on the order of $10^{-6}$) and reproduce the same optimization elbow as exhaustive search, while offering orders-of-magnitude speedups for large symbol libraries. The approach enables robust, interpretable model discovery in high-dimensional symbolic spaces and can be parallelized to further reduce computation, making data-driven discovery feasible for complex dynamical systems.

Abstract

There exist endless examples of dynamical systems with vast available data and unsatisfying mathematical descriptions. Sparse regression applied to symbolic libraries has quickly emerged as a powerful tool for learning governing equations directly from data; these learned equations balance quantitative accuracy with qualitative simplicity and human interpretability. Here, I present a general purpose, model agnostic sparse regression algorithm that extends a recently proposed exhaustive search leveraging iterative Singular Value Decompositions (SVD). This accelerated scheme, Scalable Pruning for Rapid Identification of Null vecTors (SPRINT), uses bisection with analytic bounds to quickly identify optimal rank-1 modifications to null vectors. It is intended to maintain sensitivity to small coefficients and be of reasonable computational cost for large symbolic libraries. A calculation that would take the age of the universe with an exhaustive search but can be achieved in a day with SPRINT.

Scalable Sparse Regression for Model Discovery: The Fast Lane to Insight

TL;DR

The paper addresses scalable discovery of governing equations from data when symbolic libraries are large and coefficients may be very small. It introduces SPRINT, a fast, rank-1 update–driven extension of exhaustive sparse regression that uses bisection to identify optimal column modifications in the library matrix , with two modes: (removal) and (addition). The method relies on updates to the smallest singular value via secular functions after rank-1 changes, yielding empirical computational scaling of roughly for SPRINT-- and for SPRINT+, and a simple, largely hyperparameter-free model selection rule. Demonstrations on Kuramoto–Sivashinsky and MHD-like library sizes show that SPRINT+ can recover tiny coefficients (on the order of ) and reproduce the same optimization elbow as exhaustive search, while offering orders-of-magnitude speedups for large symbol libraries. The approach enables robust, interpretable model discovery in high-dimensional symbolic spaces and can be parallelized to further reduce computation, making data-driven discovery feasible for complex dynamical systems.

Abstract

There exist endless examples of dynamical systems with vast available data and unsatisfying mathematical descriptions. Sparse regression applied to symbolic libraries has quickly emerged as a powerful tool for learning governing equations directly from data; these learned equations balance quantitative accuracy with qualitative simplicity and human interpretability. Here, I present a general purpose, model agnostic sparse regression algorithm that extends a recently proposed exhaustive search leveraging iterative Singular Value Decompositions (SVD). This accelerated scheme, Scalable Pruning for Rapid Identification of Null vecTors (SPRINT), uses bisection with analytic bounds to quickly identify optimal rank-1 modifications to null vectors. It is intended to maintain sensitivity to small coefficients and be of reasonable computational cost for large symbolic libraries. A calculation that would take the age of the universe with an exhaustive search but can be achieved in a day with SPRINT.
Paper Structure (5 sections, 13 equations, 5 figures)

This paper contains 5 sections, 13 equations, 5 figures.

Figures (5)

  • Figure 1: The size of a library $|\mathcal{L}_n|$ using 3+1D ideal MHD variables. A 12-letter symbolic alphabet is made up of eight physical fields and four partial derivatives. The library size is trivially bounded by equation \ref{['eq:Ln_upper_bound']} in red. Permutation symmetries make many words redundant but this does not prevent the catastrophic scaling of $|\mathcal{L}_n|$. A complete library allowing only up to 4 letters per word will require thousands of library terms.
  • Figure 2: An evaluation of the secular function $f_{-}(\sigma)$ for an example matrix ${\bf G}$ in $\mathbb{R}^{3\times 3}$. The last column of the matrix is set to be removed. (a) $f_-(\sigma)$ has poles at the singular values with definitive signs: the new singular values $\sigma'_i$ will be between the previous singular values $\sigma_i > \sigma'_i > \sigma_{i+1}$ when there is no degeneracy in ${\bf G}$. The two new singular values are the roots of $f(\sigma)$ represented by red points. (b) The regularized function $\tilde{f}_{-}(\sigma)$, which is now finite at the smallest two singular values. Note that $\tilde{f}_{-}(\sigma)$ is not monotonic. The blue points represent the bounds on the singular value, and the values of $\tilde{f}_-$ at these points will always be of opposite sign. (c) The numerical convergence of the bisection. $\sigma_{(k)}$ is the $k$th iteration of bisection.
  • Figure 3: A numerical solution $u(x,t)$ to equation \ref{['eq:KS_mod']}. The domain size $L=22$ is chosen to make the evolution chaotic. The red rectangles show the sizes and random distribution of weak-form subdomains used for the weak evaluation of library terms.
  • Figure 4: Optimization curve showing the greedy residual $r_k$ as a function of nonzero terms in ${\bf c}$. The starting subspace for SPRINT+ was the $k=1$ choice of $\partial_x^2 u$ to coincide with the exhaustive search. (a) Optimization curve for the library including the time derivative $\partial_t u$. A dynamic closure is found as demonstrated by the elbow at $k=8$. (b) The optimization curve when the time derivative is not included in the library. Instead of a dynamic closure, an approximate spatial constraint is found on the solution manifold at $k=13$.
  • Figure 5: (a) Empirical walltime scaling for various sparse regression techniques. Power law exponents were empirically determined by performing a least squares fit to $\log(t) = \alpha \log(n) + \beta$ for $\alpha$ and $\beta$. Each point shows the walltime of an algorithm applied to a real square matrix with elements drawn from a uniform distribution over $[-1,1]$. The reported walltime is the mean of eight independent trials. Four algorithms are considered: the exhaustive search, SPRINT--, SPRINT+, and an implementation of implicit-SINDy available at https://github.com/dynamicslab/SINDy-PIkaheman2020sindy. The exhaustive search algorithm displays the $n^{\sim4}$ scaling, while the SPRINT-- algorithm has approximately $n^{\sim3}$ scaling. These algorithms start with similar cost for small libraries, but quickly depart as the library size $n$ grows beyond O(100). (b) The extrapolated walltimes of sparse regression for the library sizes of Figure \ref{['fig:library_scaling']}. Calculations that would take the exhaustive search the order of a Hubble time can be accomplished in one day with SPRINT+.