Table of Contents
Fetching ...

Turning Time Series into Algebraic Equations: Symbolic Machine Learning for Interpretable Modeling of Chaotic Time Series

Madhurima Panja, Grace Younes, Tanujit Chakraborty

TL;DR

Two complementary symbolic forecasters that learn explicit, interpretable algebraic equations from chaotic time series data are proposed that achieve competitive one step ahead accuracy while providing transparent equations that reveal salient aspects of the underlying dynamics.

Abstract

Chaotic time series are notoriously difficult to forecast. Small uncertainties in initial conditions amplify rapidly, while strong nonlinearities and regime dependent variability constrain predictability. Although modern deep learning often delivers strong short horizon accuracy, its black box nature limits scientific insight and practical trust in settings where understanding the underlying dynamics matters. To address this gap, we propose two complementary symbolic forecasters that learn explicit, interpretable algebraic equations from chaotic time series data. Symbolic Neural Forecaster (SyNF) adapts a neural network based equation learning architecture to the forecasting setting, enabling fully differentiable discovery of compact and interpretable algebraic relations. The Symbolic Tree Forecaster (SyTF) builds on evolutionary symbolic regression to search directly over equation structures under a principled accuracy complexity trade off. We evaluate both approaches in a rolling window nowcasting setting with one step ahead forecasting using several accuracy metrics and compare against a broad suite of baselines spanning classical statistical models, tree ensembles, and modern deep learning architectures. Numerical experiments cover a benchmark of 132 low dimensional chaotic attractors and two real world chaotic time series, namely weekly dengue incidence in San Juan and the Nino 3.4 sea surface temperature index. Across datasets, symbolic forecasters achieve competitive one step ahead accuracy while providing transparent equations that reveal salient aspects of the underlying dynamics.

Turning Time Series into Algebraic Equations: Symbolic Machine Learning for Interpretable Modeling of Chaotic Time Series

TL;DR

Two complementary symbolic forecasters that learn explicit, interpretable algebraic equations from chaotic time series data are proposed that achieve competitive one step ahead accuracy while providing transparent equations that reveal salient aspects of the underlying dynamics.

Abstract

Chaotic time series are notoriously difficult to forecast. Small uncertainties in initial conditions amplify rapidly, while strong nonlinearities and regime dependent variability constrain predictability. Although modern deep learning often delivers strong short horizon accuracy, its black box nature limits scientific insight and practical trust in settings where understanding the underlying dynamics matters. To address this gap, we propose two complementary symbolic forecasters that learn explicit, interpretable algebraic equations from chaotic time series data. Symbolic Neural Forecaster (SyNF) adapts a neural network based equation learning architecture to the forecasting setting, enabling fully differentiable discovery of compact and interpretable algebraic relations. The Symbolic Tree Forecaster (SyTF) builds on evolutionary symbolic regression to search directly over equation structures under a principled accuracy complexity trade off. We evaluate both approaches in a rolling window nowcasting setting with one step ahead forecasting using several accuracy metrics and compare against a broad suite of baselines spanning classical statistical models, tree ensembles, and modern deep learning architectures. Numerical experiments cover a benchmark of 132 low dimensional chaotic attractors and two real world chaotic time series, namely weekly dengue incidence in San Juan and the Nino 3.4 sea surface temperature index. Across datasets, symbolic forecasters achieve competitive one step ahead accuracy while providing transparent equations that reveal salient aspects of the underlying dynamics.
Paper Structure (18 sections, 23 equations, 9 figures, 3 tables)

This paper contains 18 sections, 23 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: A dataset of 132 distinct low-dimensional chaotic systems, colored by largest Lyapunov exponent ($\lambda_{\max}$).
  • Figure 2: Real-world datasets along with their autocorrelation function (ACF) and partial autocorrelation function (PACF).
  • Figure 3: Symbolic Forecasting Architectures. The upper panel illustrates the architecture of the Symbolic Neural Forecaster (SyNF), where a differentiable neural training mechanism is used to learn interpretable forecasting equations from symbolic operators and lagged inputs. The lower panel presents the workflow of the Symbolic Tree Forecaster (SyTF), which employs an evolutionary search to construct and refine symbolic expression trees based on predictive performance and interpretability. The demonstration is conducted on a simulated dataset. The mathematical expression obtained from the SyNF model is $\hat{y}_t = 0.1110 + 0.9421y_{t-1} + 0.0001\left(y_{t-1}\right)^2$ (top image) whereas the SyTF architecture yields $\hat{y}_{t, \text{SyTF}} = 0.9613y_{t-1}$ (bottom image) using a single historical lagged observation. The one-step-ahead rolling window forecasts are also presented in the upper and lower panels for the simulated example datasets.
  • Figure 4: Comparative performance of symbolic forecasting approaches and state-of-the-art frameworks for one-step ahead (rolling window) forecasting of chaotic datasets using five historical observations for each model. In the plot, panels show error distributions across four metrics: RMSE (top-left), MAE (top-right), MARRE (bottom-left), and SMAPE (bottom-right). Red dots denote median performance; error bars represent the interquartile range (IQR). For all panels, models are ranked by ascending median error (lower values indicate better performance).
  • Figure 5: Comparative performance of symbolic forecasting approaches and state-of-the-art frameworks for one-step ahead forecasting of chaotic datasets using ten historical observations for each model. In the plot, panels show error distributions across four metrics: RMSE (top-left), MAE (top-right), MARRE (bottom-left), and SMAPE (bottom-right). Red dots denote median performance; error bars represent the interquartile range (IQR). For all panels, models are ranked by ascending median error (lower values indicate better performance).
  • ...and 4 more figures

Theorems & Definitions (1)

  • Remark 1