Table of Contents
Fetching ...

Effects of Dropout on Performance in Long-range Graph Learning Tasks

Jasraj Singh, Keyue Jiang, Brooks Paige, Laura Toni

TL;DR

This work analyzes how dropout-based methods used to alleviate over-smoothing in deep GNNs impact the ability to model long-range interactions, i.e., LRIs. Through theoretical sensitivity analysis on linear GCNs and bounded extensions to nonlinear MPNNs, it shows that DropEdge-variants typically reduce cross-distance information flow, shrinking the effective receptive field for distant nodes. To address this, the authors introduce DropSens, a sensitivity-aware edge-dropping scheme that preserves a controllable fraction of cross-edge information, improving distant-node communication. Empirically, DropSens outperforms graph rewiring techniques on long-range tasks while conventional DropEdge and Dropout variants can hinder performance on such tasks, underscoring the need to re-evaluate training strategies for deep GNNs and to prioritize LRIs when designing regularization methods.

Abstract

Message Passing Neural Networks (MPNNs) are a class of Graph Neural Networks (GNNs) that propagate information across the graph via local neighborhoods. The scheme gives rise to two key challenges: over-smoothing and over-squashing. While several Dropout-style algorithms, such as DropEdge and DropMessage, have successfully addressed over-smoothing, their impact on over-squashing remains largely unexplored. This represents a critical gap in the literature, as failure to mitigate over-squashing would make these methods unsuitable for long-range tasks -- the intended use case of deep MPNNs. In this work, we study the aforementioned algorithms, and closely related edge-dropping algorithms -- DropNode, DropAgg and DropGNN -- in the context of over-squashing. We present theoretical results showing that DropEdge-variants reduce sensitivity between distant nodes, limiting their suitability for long-range tasks. To address this, we introduce DropSens, a sensitivity-aware variant of DropEdge that explicitly controls the proportion of information lost due to edge-dropping, thereby increasing sensitivity to distant nodes despite dropping the same number of edges. Our experiments on long-range synthetic and real-world datasets confirm the predicted limitations of existing edge-dropping and feature-dropping methods. Moreover, DropSens consistently outperforms graph rewiring techniques designed to mitigate over-squashing, suggesting that simple, targeted modifications can substantially improve a model's ability to capture long-range interactions. Our conclusions highlight the need to re-evaluate and re-design existing methods for training deep GNNs, with a renewed focus on modelling long-range interactions.

Effects of Dropout on Performance in Long-range Graph Learning Tasks

TL;DR

This work analyzes how dropout-based methods used to alleviate over-smoothing in deep GNNs impact the ability to model long-range interactions, i.e., LRIs. Through theoretical sensitivity analysis on linear GCNs and bounded extensions to nonlinear MPNNs, it shows that DropEdge-variants typically reduce cross-distance information flow, shrinking the effective receptive field for distant nodes. To address this, the authors introduce DropSens, a sensitivity-aware edge-dropping scheme that preserves a controllable fraction of cross-edge information, improving distant-node communication. Empirically, DropSens outperforms graph rewiring techniques on long-range tasks while conventional DropEdge and Dropout variants can hinder performance on such tasks, underscoring the need to re-evaluate training strategies for deep GNNs and to prioritize LRIs when designing regularization methods.

Abstract

Message Passing Neural Networks (MPNNs) are a class of Graph Neural Networks (GNNs) that propagate information across the graph via local neighborhoods. The scheme gives rise to two key challenges: over-smoothing and over-squashing. While several Dropout-style algorithms, such as DropEdge and DropMessage, have successfully addressed over-smoothing, their impact on over-squashing remains largely unexplored. This represents a critical gap in the literature, as failure to mitigate over-squashing would make these methods unsuitable for long-range tasks -- the intended use case of deep MPNNs. In this work, we study the aforementioned algorithms, and closely related edge-dropping algorithms -- DropNode, DropAgg and DropGNN -- in the context of over-squashing. We present theoretical results showing that DropEdge-variants reduce sensitivity between distant nodes, limiting their suitability for long-range tasks. To address this, we introduce DropSens, a sensitivity-aware variant of DropEdge that explicitly controls the proportion of information lost due to edge-dropping, thereby increasing sensitivity to distant nodes despite dropping the same number of edges. Our experiments on long-range synthetic and real-world datasets confirm the predicted limitations of existing edge-dropping and feature-dropping methods. Moreover, DropSens consistently outperforms graph rewiring techniques designed to mitigate over-squashing, suggesting that simple, targeted modifications can substantially improve a model's ability to capture long-range interactions. Our conclusions highlight the need to re-evaluate and re-design existing methods for training deep GNNs, with a renewed focus on modelling long-range interactions.

Paper Structure

This paper contains 42 sections, 5 theorems, 36 equations, 9 figures, 9 tables.

Key Result

Lemma 3.1

The expected propagation matrix under DropEdge is given as: where $q\in[0, 1)$ is the dropping probability.

Figures (9)

  • Figure 1: Empirical sensitivity analysis using the Cora dataset.
  • Figure 2: Train and test MAE of 11-layer GCNs on the SyntheticZINC dataset, averaged over 10 initializations.
  • Figure 3: Relative change in test-time performance of a GCN using DropSens, compared to the baseline DropEdge, on real-world datasets from \ref{['sec:real-world']}.
  • Figure 4: Entries of $\ddot{\bm{P}}^6$, averaged after binning node-pairs by their shortest distance.
  • Figure 5: Entries of $\sum_{\ell=0}^6 \dot{\bm{P}}^\ell$, averaged after binning node-pairs by their shortest distance.
  • ...and 4 more figures

Theorems & Definitions (7)

  • Lemma 3.1
  • Lemma 3.2
  • Theorem 3.1
  • Lemma
  • proof
  • Theorem
  • proof