Optimal Recurrent Network Topologies for Dynamical Systems Reconstruction

Christoph Jürgen Hemmer; Manuel Brenner; Florian Hess; Daniel Durstewitz

Optimal Recurrent Network Topologies for Dynamical Systems Reconstruction

Christoph Jürgen Hemmer, Manuel Brenner, Florian Hess, Daniel Durstewitz

TL;DR

This paper shows that pruning by weight magnitude is ineffective for dynamical systems reconstruction (DSR) and that geometry-based pruning, which preserves attractor structure, can dramatically reduce network size without harming DSR quality. It reveals that the resulting sparse RNN topologies exhibit hub-and-small-world characteristics and introduces GeoHub, a method to automatically generate such topologies to serve as priors for DS modeling. Empirical evaluations across chaotic and non-chaotic benchmarks (Lorenz-63, bursting neuron, ECG, Lorenz-96, Rössler) demonstrate that topology, not weight magnitude, primarily drives performance and that geometry-based pruning can achieve up to 95% sparsity while maintaining fidelity in attractor geometry $D_{stsp}$ and long-term spectra $D_H$. The findings suggest a topology-centric form of the Lottery Ticket Hypothesis for DSR, with practical implications for designing efficient, interpretable RNN priors for dynamical modeling.

Abstract

In dynamical systems reconstruction (DSR) we seek to infer from time series measurements a generative model of the underlying dynamical process. This is a prime objective in any scientific discipline, where we are particularly interested in parsimonious models with a low parameter load. A common strategy here is parameter pruning, removing all parameters with small weights. However, here we find this strategy does not work for DSR, where even low magnitude parameters can contribute considerably to the system dynamics. On the other hand, it is well known that many natural systems which generate complex dynamics, like the brain or ecological networks, have a sparse topology with comparatively few links. Inspired by this, we show that geometric pruning, where in contrast to magnitude-based pruning weights with a low contribution to an attractor's geometrical structure are removed, indeed manages to reduce parameter load substantially without significantly hampering DSR quality. We further find that the networks resulting from geometric pruning have a specific type of topology, and that this topology, and not the magnitude of weights, is what is most crucial to performance. We provide an algorithm that automatically generates such topologies which can be used as priors for generative modeling of dynamical systems by RNNs, and compare it to other well studied topologies like small-world or scale-free networks.

Optimal Recurrent Network Topologies for Dynamical Systems Reconstruction

TL;DR

and long-term spectra

. The findings suggest a topology-centric form of the Lottery Ticket Hypothesis for DSR, with practical implications for designing efficient, interpretable RNN priors for dynamical modeling.

Abstract

Paper Structure (36 sections, 33 equations, 30 figures, 5 algorithms)

This paper contains 36 sections, 33 equations, 30 figures, 5 algorithms.

Introduction
Related Work
Dynamical Systems Reconstruction (DSR)
Pruning and Lottery Ticket Hypothesis
Network Topology of Real World Systems
Network Topology in RNNs
Methodological Setting
DSR Model and Training
Weight Pruning
Analysis of Network Topology
Results
Performance Evaluation
Geometry-Based, but not Magnitude-Based, Pruning Allows for Substantial Reduction in Network Size
Network Topology, not Weight Configuration is Essential to Performance
Distilling Network Topology for Enhanced DSR Training
...and 21 more sections

Figures (30)

Figure 1: a) Illustration of geometry-based pruning. Top shows the (ground truth) iconic Lorenz-63 DeterministicNonperiodicFlow chaotic attractor (blue) and an optimal PLRNN reconstruction (red), while below three reconstructions are shown with a single weight parameter removed with high (leftmost), medium (center) or low (rightmost) influence on attractor geometry. Measure for geometrical (dis)agreement ($D_{\text{stsp}}$) on top of each graph, and geometric importance score and magnitude of pruned parameter indicated below. b) Weight parameters with large ($\Delta D_{\text{stsp}}>0.1$) vs. low ($\Delta D_{\text{stsp}}\leq0.1$) impact on geometrical reconstruction quality do not substantially differ in absolute magnitude. c) Change in geometrical disagreement ($\Delta D_{\text{stsp}}$) vs. weight magnitude for PLRNNs trained on the Lorenz-63. Note there is no discernible trend for larger weights to associate with stronger effects on attractor geometry. d) The effects of weight removal on $\Delta D_{\text{stsp}}$ are largely additive, with simultaneous removal of two weights having about the same effect as the sum of the individual weight effects.
Figure 2: Approach for translating graph-topological properties of trained networks into a general scheme to be used as topological prior.
Figure 3: Quantification of DS reconstruction quality in terms of attractor geometry disagreement ($D_{\text{stsp}}$, left column) and disagreement in long-term temporal structure ($D_{\text{H}}$, right column) as a function of network pruning (x-axis, exponential scale) and different pruning criteria. Error bars = SEM.
Figure 4: Difference in $D_{\text{stsp}}$ when using the initial weights $\bm{\theta}_0$ and reinitialized weights $\bm{\theta}_*$ shows there is no strong or consistent influence of the specific weight initialization. Error bands = standard deviation.
Figure 5: Graph properties of geometrically pruned, Barabási-Albert (BA), Watts-Strogatz (WS), and Erdős–Rényi (ER) networks, with $92.4\%$ of parameters removed and averaged across all datasets with $M=50$. a) Cumulative degree distribution $F(k')$ as a function of normalized degree $k'=\frac{k}{n-1}$, separated according to in- and out-degree (for geometrically pruned network). b) Comparison of degree distributions $P(k')$ for readout vs. hidden nodes of geometrically pruned networks. c) Average path lengths $L$ for all four network topologies. Note that Erdős–Rényi graphs are not a naive baseline here, but are also known to have small path length watts_collective_1998. d) Clustering coefficients $C$ for the same. See also Fig. \ref{['fig:network_graph_properties_M100']}.
...and 25 more figures

Optimal Recurrent Network Topologies for Dynamical Systems Reconstruction

TL;DR

Abstract

Optimal Recurrent Network Topologies for Dynamical Systems Reconstruction

Authors

TL;DR

Abstract

Table of Contents

Figures (30)