Table of Contents
Fetching ...

Predicting Dynamics of Ultra-Large Complex Systems by Inferring Governing Equations

Qi Shao, Duxin Chen, Jiawen Chen, Yujie Zeng, Athen Ma, Wenwu Yu, Vito Latora, Wei Lin

Abstract

Predicting the behavior of ultra-large complex systems, from climate to biological and technological networks, is a central unsolved challenge. Existing approaches face a fundamental trade-off: equation discovery methods provide interpretability but fail to scale, while neural networks scale but operate as black boxes and often lose reliability over long times. Here, we introduce the Sparse Identification Graph Neural Network, a framework that overcome this divide by allowing to infer the governing equations of large networked systems from data. By defining symbolic discovery as edge-level information, SIGN decouples the scalability of sparse identification from network size, enabling efficient equation discovery even in large systems. SIGN allows to study networks with over 100,000 nodes while remaining robust to noise, sparse sampling, and missing data. Across diverse benchmark systems, including coupled chaotic oscillators, neural dynamics, and epidemic spreading, it recovers governing equations with high precision and sustains accurate long-term predictions. Applied to a data set of time series of temperature measurements in 71,987 sea surface positions, SIGN identifies a compact predictive network model and captures large-scale sea surface temperature conditions up to two years in advance. By enabling equation discovery at previously inaccessible scales, SIGN opens a path toward interpretable and reliable prediction of real-world complex systems.

Predicting Dynamics of Ultra-Large Complex Systems by Inferring Governing Equations

Abstract

Predicting the behavior of ultra-large complex systems, from climate to biological and technological networks, is a central unsolved challenge. Existing approaches face a fundamental trade-off: equation discovery methods provide interpretability but fail to scale, while neural networks scale but operate as black boxes and often lose reliability over long times. Here, we introduce the Sparse Identification Graph Neural Network, a framework that overcome this divide by allowing to infer the governing equations of large networked systems from data. By defining symbolic discovery as edge-level information, SIGN decouples the scalability of sparse identification from network size, enabling efficient equation discovery even in large systems. SIGN allows to study networks with over 100,000 nodes while remaining robust to noise, sparse sampling, and missing data. Across diverse benchmark systems, including coupled chaotic oscillators, neural dynamics, and epidemic spreading, it recovers governing equations with high precision and sustains accurate long-term predictions. Applied to a data set of time series of temperature measurements in 71,987 sea surface positions, SIGN identifies a compact predictive network model and captures large-scale sea surface temperature conditions up to two years in advance. By enabling equation discovery at previously inaccessible scales, SIGN opens a path toward interpretable and reliable prediction of real-world complex systems.

Paper Structure

This paper contains 17 sections, 10 equations, 5 figures.

Figures (5)

  • Figure 1: SIGN pipeline. SIGN facilitates accurate and scalable equation discovery through a two-phase process. a, An example of the network of coupled Rössler oscillators that we aim to reconstruct from its time-series observations. b, Candidate libraries for intrinsic terms $F$ and coupling terms $C$, each generated from a set of nonlinear basis functions. c, Phase I: Sparse regression on a small subset of nodes, followed by DBSCAN clustering to determine a global support mask. d, Phase II: Message passing on the full graph using the learned mask to estimate the shared coefficients of $F$ and $C$. e, Masked intrinsic and coupling messages, used to form the recovered equations. f, Coefficient recovery error (sMAPE) across network topologies and sizes. g, Reconstructed versus true trajectories from integrating the inferred equations, showing the model’s predictive accuracy.
  • Figure 2: Equation discovery across diverse networked dynamical systems using the SIGN framework. a, Kuramoto phase-oscillator network. Here $x_i(t)$ denotes the phase of oscillator $i$, and the interaction between connected oscillators follows the sinusoidal coupling $\sin(x_j-x_i)$, where $j$ ranges over neighbors of $i$ defined by the network adjacency $A_{ij}$ (i.e., edges indicate which pairs interact). The left panel reports the inference error of the coupling-term coefficient across synthetic scale-free networks with sizes of $10^3$ and $10^5$ nodes, respectively, and two large empirical networks (GitHub and Catster), while the right panel shows an example of true versus reconstructed phase trajectories $x_i(t)$ for three representative nodes. b, SIS model on small-world and empirical networks: coefficient inference errors for the epidemic-spreading dynamics across network types and sizes. c, Michaelis--Menten (MM) regulatory model: coefficient inference errors for gene-regulatory dynamics across multiple network topologies and scales. d, FitzHugh--Nagumo model: coefficient inference errors for synthetic simulations and a large-scale empirical brain network. e, Hindmarsh--Rose model: coefficient inference errors across model terms, highlighting the distribution of errors across coefficients.
  • Figure 3: SIGN supports equation inference under noise, limited observations, complex (out-of-basis) dynamics, and imperfect network information. All experiments are repeated 10 times. Panels report coefficient errors (sMAPE) and representative trajectory reconstructions where applicable. a, Noise robustness: sMAPE under additive Gaussian noise as the signal-to-noise ratio (SNR) decreases (down to 60 dB) across the benchmark systems. b,c, Sampling resolution on ultra-large networks ($10^5$ nodes): b shows sMAPE as the number of observed time points is reduced (down to $200$); c shows sMAPE as the sampling interval $\Delta t$ is varied (with $\Delta t=0.01 \sim 0.05$). d,e, Non-canonical dynamics (basis mismatch): representative trajectory reconstructions for systems whose generating functions are not contained in the candidate library, including a mutualistic model with fractional nonlinearities and the chaotic Chua circuit; reconstruction error is quantified by mean squared error (MSE) across nodes. f, Phase heterogeneity: Kuramoto oscillators with natural frequencies drawn from $\mathcal{N}(0.3,\sigma)$; sMAPE as the dispersion $\sigma$ increases. g, Strong coupling: FitzHugh--Nagumo networks as coupling strength increases toward synchronized dynamics, quantified by the order parameter $\langle R\rangle \to 1$. h, Structured topologies: sMAPE on low-rank random graph models (RGM) and stochastic block models (SBM) with $1{,}000$ nodes. i, Structural incompleteness: sMAPE under missingness, with up to 20% node removal and 40% edge deletion prior to inference. j, Baselines and scalability: coefficient-error comparison with Two-stage and LaGNA under structural incompleteness, and runtime scaling with network size from $10$ to $10^5$ nodes.
  • Figure 4: Scalable prediction of large-scale FitzHugh--Nagumo neural dynamics under observational noise using SIGN. a, Representative phase-space trajectories obtained by integrating the inferred equations, shown for multiple noise conditions; solid and dashed curves indicate the training and testing intervals, respectively, and nodes are randomly sampled with noise intensities centered around the median. b, Distribution of node-wise prediction error (MSE) at SNR $=50$ dB (left) and $=30$ dB (right). c, Two-dimensional density plots comparing true versus predicted values at SNR $=50$ dB (left) and $=30$ dB (right); density indicates how frequently pairs fall into the same region. d, Prediction error (MSE) evaluated across temporal segments, with the training and testing intervals indicated. e, Comparison on a $1{,}000$-node simulated dataset under SNR $=50$ dB and $=30$ dB, reporting predictive errors for SIGN and neural-network-based predictors.
  • Figure 5: SIGN equation inference and integration-based prediction of large-scale sea surface temperature (SST) dynamics. a, Spatial map of node-wise MAPE over the 8-year training period (June 2002--June 2010), summarizing regional prediction error (mean MAPE $=1.93\%$). b, Spatial map of node-wise MAPE over the 2-year testing period (July 2010--June 2012) (mean MAPE $=3.55\%$). c, Representative SST trajectories at four randomly selected grid points: observed SST (blue) versus trajectories obtained by integrating the SIGN-inferred equations (purple). Solid and dashed segments indicate training and testing periods, respectively. d, Distribution of MAPE aggregated over all nodes and time steps. e, Two-dimensional density plot comparing predicted versus observed SST values (shown over $24$--$29^{\circ}\mathrm{C}$); density indicates the frequency of value pairs. f, Relationship between local SST variability (standard deviation) and node-wise prediction error (MAPE), evaluated on 10,000 randomly sampled nodes. g, MAPE as a function of prediction horizon over the full 10-year window, with the training/testing boundary indicated.