Table of Contents
Fetching ...

Beyond ReLU: Bifurcation, Oversmoothing, and Topological Priors

Erkan Turan, Gaspard Abel, Maysam Behmanesh, Emery Pierson, Maks Ovsjanikov

TL;DR

The central contribution is the theoretical discovery that this undesired stability can be broken by replacing standard monotone activations with a class of functions, and analytically prove that this substitution induces a bifurcation that destabilizes the homogeneous state and creates a new pair of stable, non-homogeneous patterns that provably resist oversmoothing.

Abstract

Graph Neural Networks (GNNs) learn node representations through iterative network-based message-passing. While powerful, deep GNNs suffer from oversmoothing, where node features converge to a homogeneous, non-informative state. We re-frame this problem of representational collapse from a \emph{bifurcation theory} perspective, characterizing oversmoothing as convergence to a stable ``homogeneous fixed point.'' Our central contribution is the theoretical discovery that this undesired stability can be broken by replacing standard monotone activations (e.g., ReLU) with a class of functions. Using Lyapunov-Schmidt reduction, we analytically prove that this substitution induces a bifurcation that destabilizes the homogeneous state and creates a new pair of stable, non-homogeneous \emph{patterns} that provably resist oversmoothing. Our theory predicts a precise, nontrivial scaling law for the amplitude of these emergent patterns, which we quantitatively validate in experiments. Finally, we demonstrate the practical utility of our theory by deriving a closed-form, bifurcation-aware initialization and showing its utility in real benchmark experiments.

Beyond ReLU: Bifurcation, Oversmoothing, and Topological Priors

TL;DR

The central contribution is the theoretical discovery that this undesired stability can be broken by replacing standard monotone activations with a class of functions, and analytically prove that this substitution induces a bifurcation that destabilizes the homogeneous state and creates a new pair of stable, non-homogeneous patterns that provably resist oversmoothing.

Abstract

Graph Neural Networks (GNNs) learn node representations through iterative network-based message-passing. While powerful, deep GNNs suffer from oversmoothing, where node features converge to a homogeneous, non-informative state. We re-frame this problem of representational collapse from a \emph{bifurcation theory} perspective, characterizing oversmoothing as convergence to a stable ``homogeneous fixed point.'' Our central contribution is the theoretical discovery that this undesired stability can be broken by replacing standard monotone activations (e.g., ReLU) with a class of functions. Using Lyapunov-Schmidt reduction, we analytically prove that this substitution induces a bifurcation that destabilizes the homogeneous state and creates a new pair of stable, non-homogeneous \emph{patterns} that provably resist oversmoothing. Our theory predicts a precise, nontrivial scaling law for the amplitude of these emergent patterns, which we quantitatively validate in experiments. Finally, we demonstrate the practical utility of our theory by deriving a closed-form, bifurcation-aware initialization and showing its utility in real benchmark experiments.
Paper Structure (68 sections, 8 theorems, 103 equations, 5 figures, 2 tables)

This paper contains 68 sections, 8 theorems, 103 equations, 5 figures, 2 tables.

Key Result

Lemma 3.1

Let $f$ be an operator with Lipschitz constant $\|f\|_{\mathrm{Lip}} < 1$. Then, for any initial condition $x_0$, iteration $x^{(\ell+1)} = f(x^{(\ell)})$ converges linearly to the unique fixed point of $f$.

Figures (5)

  • Figure 1: Effective potential landscape across bifurcation.Left: Supercritical Pitchfork bifurcation. Right: Transcritical bifurcation. For both bifurcation types, below the critical coupling $w<1$, the system is at the subcritical regime -- the homogeneous state $a=0$ is the unique stable minimum, corresponding to oversmoothing. Above $w>1$, the origin becomes unstable (hollow circle) and stable minima emerge (filled circles), representing non-homogeneous pattern formation that resists oversmoothing.
  • Figure 2: Phase diagram validating Theorem \ref{['thm:pitchfork_general']}.Left: Dirichlet energy $E_D$ as a function of activation slope $\alpha$ and coupling $w$ (dashed red) precisely separates the oversmoothing regime $E_D \approx0$ for the pattern forming regime $E_D>0$. Right: Node representations at marked points. In the subcritical regime all representations collapse to zero (homogeneous fixed point). In the supercritical regime, a structured pattern emerges.
  • Figure 3: Empirical validation of theoretical predictions. Dirichlet energy, amplitude and scaling laws versus normalized coupling $\frac{w}{w_k}$ on Erdos-Renyi, Barabasi-Albert, Watts-Strogatz and Random-Regular graph topologies
  • Figure 4: Spectral filtering controls mode selection.Top: Bandpass filters $P(\lambda)$ centered at low, mid and high frequencies. The dashed line is the critical threshold $\frac{1}{\alpha w}$, only modes that exceed this threshold can bifurcate. Bottom: Stable patterns on a $10 \times 10$ grid graph obtained by iterating $x^{(\ell+1)}=\phi(wP(A)x^{(\ell)})$ to convergence from random initializations. Each filter selects distinct eigenmode: smooth (low frequency), stripped (mid-frequency), or checkerboard (high-frequency), which illustrates topological prior selection without collapse.
  • Figure 5: Depth Robustness and Phase Transitions on CORA.(a, b)$E_{D}^{0}$ and accuracy vs. depth. Activations with stabilizing cubic terms (sin, tanh) resist oversmoothing up to 64 layers, whereas ReLU collapses. (c, d) Phase transition at depth 64. Varying the bifurcation parameter $\delta$ reveals a transition near $\delta=0$; only supercritical initialization ($\delta > 0$) enables pattern formation and successful learning.

Theorems & Definitions (15)

  • Lemma 3.1
  • Theorem 3.2: Supercritical pitchfork from odd $C^3$ activations
  • Remark 3.3
  • Corollary 3.4: Dirichlet Energy
  • Corollary 3.5: Effective Potential Energy
  • Corollary 4.1: NTK Mode Selection at Bifurcation
  • Remark 4.2
  • Theorem 5.1: GNN Bifurcation on realistic GNNs
  • Corollary 5.2: Critical initialization
  • Remark 6.1
  • ...and 5 more