Table of Contents
Fetching ...

Probing Graph Neural Network Activation Patterns Through Graph Topology

Floriano Tori, Lorenzo Bini, Marco Sorbi, Stéphane Marchand-Maillet, Vincent Ginis

TL;DR

This work reframes curvature as a diagnostic probe for understanding when and why graph learning fails, and identifies a systemic \textit{curvature shift}: global attention mechanisms exacerbate topological bottlenecks, drastically increasing the prevalence of negative curvature.

Abstract

Curvature notions on graphs provide a theoretical description of graph topology, highlighting bottlenecks and denser connected regions. Artifacts of the message passing paradigm in Graph Neural Networks, such as oversmoothing and oversquashing, have been attributed to these regions. However, it remains unclear how the topology of a graph interacts with the learned preferences of GNNs. Through Massive Activations, which correspond to extreme edge activation values in Graph Transformers, we probe this correspondence. Our findings on synthetic graphs and molecular benchmarks reveal that MAs do not preferentially concentrate on curvature extremes, despite their theoretical link to information flow. On the Long Range Graph Benchmark, we identify a systemic \textit{curvature shift}: global attention mechanisms exacerbate topological bottlenecks, drastically increasing the prevalence of negative curvature. Our work reframes curvature as a diagnostic probe for understanding when and why graph learning fails.

Probing Graph Neural Network Activation Patterns Through Graph Topology

TL;DR

This work reframes curvature as a diagnostic probe for understanding when and why graph learning fails, and identifies a systemic \textit{curvature shift}: global attention mechanisms exacerbate topological bottlenecks, drastically increasing the prevalence of negative curvature.

Abstract

Curvature notions on graphs provide a theoretical description of graph topology, highlighting bottlenecks and denser connected regions. Artifacts of the message passing paradigm in Graph Neural Networks, such as oversmoothing and oversquashing, have been attributed to these regions. However, it remains unclear how the topology of a graph interacts with the learned preferences of GNNs. Through Massive Activations, which correspond to extreme edge activation values in Graph Transformers, we probe this correspondence. Our findings on synthetic graphs and molecular benchmarks reveal that MAs do not preferentially concentrate on curvature extremes, despite their theoretical link to information flow. On the Long Range Graph Benchmark, we identify a systemic \textit{curvature shift}: global attention mechanisms exacerbate topological bottlenecks, drastically increasing the prevalence of negative curvature. Our work reframes curvature as a diagnostic probe for understanding when and why graph learning fails.
Paper Structure (44 sections, 5 equations, 10 figures, 5 tables)

This paper contains 44 sections, 5 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: The standard barbell graph (a.), the simple modified barbell graph (b.) and the extended modified barbell graph (c.) considered. The source node (orange) contains a signal to be received by the target node (blue). The modified barbells add one or more dummy cliques containing nodes with no information about the task. These dummy cliques contain a dummy signal node (pink). The green bridge connects the source and target cliques (task-relevant), while pink bridges connect dummy cliques to the target (task-irrelevant).
  • Figure 2: Activation Ratios for the three barbell graphs: standard (a., d.), modified (b., e.) and extended modified (c., f.). The top row (a., b., c.) shows the ratios when the edge features are topologically accurate, while the bottom row (d., e., f.) when the features are permuted. The dummy bridges in the modified and extended activation ratios for the task-relevant bridge edge (green) and the dummy bridges (blue) in the modified datasets show the same (or higher) activation ratios. Note the log-scale on the $y$-axis.
  • Figure 3: Proportions of massively activated edges (based on 95th percentile) within each curvature value per model and per dataset ( (a.) ZINC (b.) Tox21) We show for each curvature value (x-axis) the relative proportion (model-wise) of edges that have been designated as massively active. For each model we shade the region based on number of edges that are present relative to all massively activated edges of that model
  • Figure 4: Enrichment values for all three models over both datasets: ZINC (a.) and Tox21 (b.). The enrichment values show that the relation between curvature and massive activations is non-monotonic, where topological bottlenecks (indicated by extreme negative curvatures) are marginally over-represented. Additionally, positively curved edges are also over-represented. The dotted line indicates $E = 1$ meaning no over- or underrepresentation.
  • Figure 5: Layer-wise evolution of activation ratios of MAs across curvature bins for Tox21 (a) and ZINC (b). Each panel corresponds to a BFc value and lines show mean activation ratio per layer. ZINC exhibits uniform early-layer decay across all curvature values, while Tox21 maintains elevated ratios throughout, with GT showing late-layer spikes at negative curvatures.
  • ...and 5 more figures