When do neural ordinary differential equations generalize on complex networks?

Moritz Laber; Tina Eliassi-Rad; Brennan Klein

When do neural ordinary differential equations generalize on complex networks?

Moritz Laber, Tina Eliassi-Rad, Brennan Klein

TL;DR

This paper probes how neural ODEs trained on Barabási–Barzel BB-form dynamical systems generalize when deployed on complex graphs generated by the $\mathbb{S}^1$-model. By replacing the analytic BB vector field components with neural nets, the authors create nODEs and evaluate them across four dimensions: scaling to larger graphs, generalization across graph properties, faithful fixed-point representation and stability, and resilience to partial observability. They find that degree heterogeneity and the underlying dynamical system largely shape generalization, with clustering playing a secondary role; fixed points tend to be stable but may not coincide with the true fixed points, and performance degrades as unobserved nodes increase, particularly on highly heterogeneous graphs. The results underscore both the potential of nODEs to illuminate complex network dynamics and the challenges posed by realistic topologies, motivating diverse evaluation frameworks and architectural refinements for robust deployment. Overall, the work provides a principled template for assessing data-driven dynamical models on graphs and highlights key structural factors—especially degree heterogeneity—that limit generalization across scales and topologies.

Abstract

Neural ordinary differential equations (neural ODEs) can effectively learn dynamical systems from time series data, but their behavior on graph-structured data remains poorly understood, especially when applied to graphs with different size or structure than encountered during training. We study neural ODEs ($\mathtt{nODE}$s) with vector fields following the Barabási-Barzel form, trained on synthetic data from five common dynamical systems on graphs. Using the $\mathbb{S}^1$-model to generate graphs with realistic and tunable structure, we find that degree heterogeneity and the type of dynamical system are the primary factors in determining $\mathtt{nODE}$s' ability to generalize across graph sizes and properties. This extends to $\mathtt{nODE}$s' ability to capture fixed points and maintain performance amid missing data. Average clustering plays a secondary role in determining $\mathtt{nODE}$ performance. Our findings highlight $\mathtt{nODE}$s as a powerful approach to understanding complex systems but underscore challenges emerging from degree heterogeneity and clustering in realistic graphs.

When do neural ordinary differential equations generalize on complex networks?

TL;DR

This paper probes how neural ODEs trained on Barabási–Barzel BB-form dynamical systems generalize when deployed on complex graphs generated by the

-model. By replacing the analytic BB vector field components with neural nets, the authors create nODEs and evaluate them across four dimensions: scaling to larger graphs, generalization across graph properties, faithful fixed-point representation and stability, and resilience to partial observability. They find that degree heterogeneity and the underlying dynamical system largely shape generalization, with clustering playing a secondary role; fixed points tend to be stable but may not coincide with the true fixed points, and performance degrades as unobserved nodes increase, particularly on highly heterogeneous graphs. The results underscore both the potential of nODEs to illuminate complex network dynamics and the challenges posed by realistic topologies, motivating diverse evaluation frameworks and architectural refinements for robust deployment. Overall, the work provides a principled template for assessing data-driven dynamical models on graphs and highlights key structural factors—especially degree heterogeneity—that limit generalization across scales and topologies.

Abstract

s) with vector fields following the Barabási-Barzel form, trained on synthetic data from five common dynamical systems on graphs. Using the

-model to generate graphs with realistic and tunable structure, we find that degree heterogeneity and the type of dynamical system are the primary factors in determining

s' ability to generalize across graph sizes and properties. This extends to

s' ability to capture fixed points and maintain performance amid missing data. Average clustering plays a secondary role in determining

performance. Our findings highlight

s as a powerful approach to understanding complex systems but underscore challenges emerging from degree heterogeneity and clustering in realistic graphs.

Paper Structure (31 sections, 27 equations, 57 figures, 3 tables)

This paper contains 31 sections, 27 equations, 57 figures, 3 tables.

Introduction
Results
Overview
Generalizing to larger graphs
Generalizing across graph properties
Fixed point approximation and local stability
Robustness on partially observed graphs
Discussion
Methods
Dynamical systems in Barabási-Barzel form
Neural ODE architecture
Training
Generating graphs
Evaluation strategies
Acknowledgments
...and 16 more sections

Figures (57)

Figure 1: Schematic depiction of (neural) ODEs on graphs and our four evaluation strategies.(a) In dynamical systems of BB form the vector field at a specific node $i$ (orange) consists of the self-dynamics $f(x_i)$, describing the time evolution of node $i$'s state $x_i(t)$ in the absence of any interactions, and a factorized interaction term $h^\mathrm{ego}(x_i)h^\mathrm{alt}(x_j)$ describing how node $i$ and $j$ (blue) interact if they are connected, i.e., $A_{ij}=1$. (b) We investigate neural ODEs that mirror this structure (nODEs) but replace $f, \, h^\mathrm{ego}, \, h^\mathrm{alt}$ with neural networks $f_\omega$ (light green), $h_\omega^\mathrm{ego}$ (red), and $h_\omega^\mathrm{alt}$ (purple), each with its own parameters. We evaluate these models by comparing their predictions $\mathbf{\hat{x}}(t)$ (pink) at discrete time points $\{\tau_r\}_{r=1}^{n_\tau}$ (dark red) with the ground truth time evolution $\mathbf{x}(t)$ (gray) starting from the same initial conditions $\mathbf{\tilde{x}}$ (dark green). We evaluate nODEs in terms of their ability (c) to generalize to graphs much larger than the training graph, (d) to generalize to graphs that differ in their parameters from the training graph, (e) to faithfully capture fixed points and their stability, and (f) to make accurate predictions if only a subset of nodes is observed at test time.
Figure 2: Generalization across graph sizes. The ability of nODEs trained to approximate different dynamical systems (SIS (blue circles), MAK (yellow squares), MM (red triangles), ND (green diamonds), BD (purple hexagons)) on small graphs, $n^\mathrm{train}=64$, to make accurate predictions on larger graphs, $n^\mathrm{test}$, with the same properties as the training graph, $(\gamma^\mathrm{test},\,\beta^\mathrm{test})=(\gamma^\mathrm{train}, \, \beta^\mathrm{train})=(\gamma, \, \beta)$, differs between (a) very degree heterogeneous graphs with weak clustering, $(\gamma, \, \beta)=(2.1, \, 0.1)$, (b) very degree heterogeneous graphs with strong clustering, $(\gamma, \, \beta)=(2.1, \, 4.1)$, (c) graphs with moderate degree heterogeneity and clustering, $(\gamma, \, \beta)=(3.0, \, 1.1)$, (d) less degree heterogeneous graphs with weak clustering $(\gamma, \, \beta)=(3.9, \, 0.1)$, and (e) less degree heterogeneous graphs with strong clustering, $(\gamma, \, \beta)=(3.9, \, 4.1)$. The mean node-wise MAE, $\bar{\mathcal{L}}_\mathrm{mae}$, over $n_G^\mathrm{test}=100$ test graphs, stays constant or increases slowly on less degree heterogeneous graphs independent of clustering but increases noticeably on more degree heterogeneous graphs for most nODEs. Those trained on the SIS model show the smallest increase in mean MAE. This means degree heterogeneity is a limiting factor for size generalization of nODEs predicting dynamical systems on graphs.
Figure 3: Generalizing to graphs with different properties. The ability of nODEs trained on small graphs, $n^\mathrm{train}=64$, with moderate degree heterogeneity and clustering, $(\gamma^\mathrm{train}, \, \beta^\mathrm{train})=(3.0, \, 1.1)$ (black cross), to generalize to graphs with different properties, $(\gamma^\mathrm{test}, \, \beta^\mathrm{test})$, on $n_G^\mathrm{test}=100$ test graphs of the same size, $n^\mathrm{test}=64$ (upper row), or larger size, $n^\mathrm{test}=8192$ (lower row), is measured as the change in mean node-wise MAE (color coded) relative to its value on graphs with the same properties and size as the training graph, denoted $\bar{\mathcal{L}}^\prime_\mathrm{mae}$. On small graphs, the normalized MAE decreases towards lower degree heterogeneity and less clustering, and increases towards more degree heterogeneity and more clustering for all dynamical systems ((a) SIS, (b) MAK, (c) MM, (d) ND, (e) BD model). On larger graphs, normalized MAE increases throughout the $\mathbb{S}^1$-model parameter range independent of the dynamical system ((f) SIS, (g) MAK, (h) MM, (i) ND, (j) BD), and most strongly towards higher degree heterogeneity. The increase is moderate for the nODE trained on the SIS model but substantial for nODEs trained on the ND model. This means nODEs are to some extent robust to changes in graph properties, especially if deployed on graphs with low degree heterogeneity and clustering and similar size to the training graph.
Figure 4: Local stability and fixed point approximation. Across dynamical systems (SIS (blue circles), MAK (yellow squares), MM (red triangles), ND (green diamonds), BD (purple hexagons)) and graph parameters ((a), (b)$(\gamma, \, \beta)=(2.1, \, 0.1)$, (c), (d)$(\gamma, \, \beta)=(2.1, \, 4.1)$, (e), (f)$(\gamma, \, \beta)=(3.0, \, 1.1)$, (g), (h)$(\gamma, \, \beta)=(3.9, \, 0.1)$, (i), (j)$(\gamma, \, \beta)=(3.9, \, 4.1)$) the largest eigenvalue $\hat{\lambda}_1$ of the Jacobian (upper row) remains negative (shaded band) for nODEs trained on small graphs, $n^\mathrm{train}=64$, with the same parameters as the test graph, $(\gamma^\mathrm{train}, \, \beta^\mathrm{train})=(\gamma^\mathrm{test}, \, \beta^\mathrm{test})=(\gamma, \, \beta)$, even though it increases on average (solid line) with the size of the test graph $n^\mathrm{test}$. The average MAE (lower row) of the fixed point approximation, $\bar{\mathcal{L}}_\mathrm{mae}^\star$, increases noticeably across dynamical systems on very degree heterogeneous graphs independent of clustering but stays constant or increases only slightly in the less degree heterogeneous case. This means that even though nODEs approach stable fixed points, these fixed points can differ from the dynamical system's true fixed point, especially on very degree heterogeneous graphs.
Figure 5: Robustness of predictions to unobserved nodes. The robustness of nODEs trained on small graphs, $n^\mathrm{train}=64$, to unobserved nodes in larger test graphs, $n^\mathrm{test}=8192$, with the same parameters as the training graph, $(\gamma^\mathrm{test}, \, \beta^\mathrm{test}) =(\gamma^\mathrm{train},\, \beta^\mathrm{train}) =(\gamma, \, \beta)$, depends on the number of observed nodes $n^\mathrm{obs}$, graph properties ((a)$(\gamma, \, \beta)=(2.1, \, 0.1)$, (b)$(\gamma, \, \beta)=(2.1, \, 4.1)$, (c)$(\gamma, \, \beta)=(3.0, \, 1.1)$, (d)$(\gamma, \, \beta)=(3.9, \, 0.1)$, (e)$(\gamma, \, \beta)=(3.9, \, 4.1)$), and the dynamical system (SIS (blue circles), MAK (yellow squares), MM (red triangles), ND (green diamonds), BD (purple hexagons)). Overall, the mean node-wise MAE, $\bar{\mathcal{L}}_\mathrm{mae}$, if $n^\mathrm{test}-n^\mathrm{obs}$ nodes remain unobserved (solid line) stays equivalent to the baseline value for a completely observed graph (dashed line) longer on very degree heterogeneous than on less degree heterogeneous graphs with clustering playing a minor role. However, the MAE tends to be higher in the very degree heterogeneous than the less degree heterogeneous setting. This means robustness to unobserved nodes depends on a complex interplay of the type of dynamical system and graph properties.
...and 52 more figures

When do neural ordinary differential equations generalize on complex networks?

TL;DR

Abstract

When do neural ordinary differential equations generalize on complex networks?

Authors

TL;DR

Abstract

Table of Contents

Figures (57)