Symbolic Regression on Sparse and Noisy Data with Gaussian Processes
Junette Hsin, Shubhankar Agarwal, Adam Thorpe, Luis Sentis, David Fridovich-Keil
TL;DR
The paper tackles the challenge of deriving analytic dynamical models from sparse and noisy data, where traditional SINDy struggles due to noisy derivatives. It introduces GPSINDy, a framework that denoises state measurements with Gaussian processes to obtain smooth trajectories $\boldsymbol{X}_{GP}$ and derivatives $\dot{\boldsymbol{X}}_{GP}$, then performs sparse symbolic regression via ADMM-LASSO on a candidate function library evaluated at $\boldsymbol{X}_{GP}$ and $\boldsymbol{U}$. Kernel selection is guided by marginal likelihood across multiple kernels, ensuring the denoising matches the data structure. The method is validated on Lotka-Volterra, unicycle dynamics, and NVIDIA JetRacer hardware data, showing consistently lower coefficient estimation error and trajectory RMSE than SINDy and neural-network baselines, especially under high noise and data sparsity. This approach enables robust, interpretable dynamical models suitable for robotics and control applications when data are limited or noisy.
Abstract
In this paper, we address the challenge of deriving dynamical models from sparse and noisy data. High-quality data is crucial for symbolic regression algorithms; limited and noisy data can present modeling challenges. To overcome this, we combine Gaussian process regression with a sparse identification of nonlinear dynamics (SINDy) method to denoise the data and identify nonlinear dynamical equations. Our approach GPSINDy offers improved robustness with sparse, noisy data compared to SINDy alone. We demonstrate its effectiveness on simulation data from Lotka-Volterra and unicycle models and hardware data from an NVIDIA JetRacer system. We show superior performance over baselines including more than 50% improvement over SINDy and other baselines in predicting future trajectories from noise-corrupted and sparse 5 Hz data.
