FlexiDrop: Theoretical Insights and Practical Advances in Random Dropout Method on GNNs

Zhiheng Zhou; Sihao Liu; Weichen Zhao

FlexiDrop: Theoretical Insights and Practical Advances in Random Dropout Method on GNNs

Zhiheng Zhou, Sihao Liu, Weichen Zhao

TL;DR

This work addresses generalization, over-smoothing, and robustness challenges in graph neural networks by deriving a Rademacher-complexity-based bound that ties generalization error to a dropout-rate dependent function. It then unifies the constraint with empirical loss into a single objective and introduces FlexiDrop, an adaptive dropout framework with trainable per-layer dropout probabilities and a dropout regularizer that minimizes the bound. Theoretical results show the loss generalization is upper-bounded by $\mathfrak{R}_{\mathcal{S}}(\ell \circ f^{(L)}) \leq M \cdot (\prod_{l=1}^{L} B^{(l)} \|\mathbf{p}^{(l)}\|_{2})$, and the complete objective facilitates joint optimization of weights and dropout, yielding improved accuracy, reduced over-smoothing, and enhanced robustness on six benchmark datasets. Empirically, FlexiDrop consistently outperforms standard dropout variants across graph, node, and link tasks, demonstrating the practical impact of theory-driven adaptive dropout in GNNs.

Abstract

Graph Neural Networks (GNNs) are powerful tools for handling graph-type data. Recently, GNNs have been widely applied in various domains, but they also face some issues, such as overfitting, over-smoothing and non-robustness. The existing research indicates that random dropout methods are an effective way to address these issues. However, random dropout methods in GNNs still face unresolved problems. Currently, the choice of dropout rate, often determined by heuristic or grid search methods, can increase the generalization error, contradicting the principal aims of dropout. In this paper, we propose a novel random dropout method for GNNs called FlexiDrop. First, we conduct a theoretical analysis of dropout in GNNs using rademacher complexity and demonstrate that the generalization error of traditional random dropout methods is constrained by a function related to the dropout rate. Subsequently, we use this function as a regularizer to unify the dropout rate and empirical loss within a single loss function, optimizing them simultaneously. Therefore, our method enables adaptive adjustment of the dropout rate and theoretically balances the trade-off between model complexity and generalization ability. Furthermore, extensive experimental results on benchmark datasets show that FlexiDrop outperforms traditional random dropout methods in GNNs.

FlexiDrop: Theoretical Insights and Practical Advances in Random Dropout Method on GNNs

TL;DR

, and the complete objective facilitates joint optimization of weights and dropout, yielding improved accuracy, reduced over-smoothing, and enhanced robustness on six benchmark datasets. Empirically, FlexiDrop consistently outperforms standard dropout variants across graph, node, and link tasks, demonstrating the practical impact of theory-driven adaptive dropout in GNNs.

Abstract

Paper Structure (25 sections, 47 equations, 4 figures, 4 tables)

This paper contains 25 sections, 47 equations, 4 figures, 4 tables.

Introduction
Related Work
Limitations of GNNs
Dropout
Dropout with GNNs
Theoretical Analysis
Rademacher complexity
Dropout in graph neural networks
FlexiDrop
Setup
Upper bound of generalization error
FlexiDrop
Experiment
Experimental Setup
Datasets
...and 10 more sections

Figures (4)

Figure 1: The experimental results of three GNN backbone models and their dropout-based methods on the Cora and PubMed dataset with different dropout rates indicate that the optimal dropout rates differ among the models. Moreover, the performance of model at some dropout rates is inferior to that of the backbone models.
Figure 2: Framework of FlexiDrop. Initially, node embeddings generate the initial features and message passing occurs across the graph, producing a message matrix. Then, messages are weighted and aggregated using aggregation coefficients to form center node embeddings. Simultaneously, the model generates an adaptive dropout vector representing the retention probability of node embeddings at each layer. Finally, these updated embeddings are used for downstream tasks. Throughout the process, dropout rates are adaptively learned through forward and backward propagation to optimize task performance.
Figure 3: The experimental results of over-smoothing analysis under node classification (where the task uses the PubMed dataset.)
Figure 4: The experimental results of parameter analysis under different tasks.

Theorems & Definitions (2)

proof
proof : Proof of Theorem \ref{['thm4.1']}

FlexiDrop: Theoretical Insights and Practical Advances in Random Dropout Method on GNNs

TL;DR

Abstract

FlexiDrop: Theoretical Insights and Practical Advances in Random Dropout Method on GNNs

Authors

TL;DR

Abstract

Table of Contents

Figures (4)

Theorems & Definitions (2)