Graph Sampling for Scalable and Expressive Graph Neural Networks on Homophilic Graphs
Haolin Li, Haoyu Wang, Luana Ruiz
TL;DR
The paper tackles scaling GNNs to large graphs by addressing the drawbacks of random subgraph sampling, which can disrupt connectivity and reduce expressive power. It introduces a feature-homophily-based sampling method that minimizes $tr(XX^T)$ to better preserve the graph Laplacian trace $tr(\mathbf{L})$, offering $O(d|E|)$ complexity and avoiding sequential node removals. Key contributions include a formal definition of feature homophily, a provable lower bound linking $tr(\mathbf{L})$ to $h_G$, and Algorithm 1 for efficient subgraph selection with favorable complexity relative to spectral methods; empirical results on citation networks demonstrate improved Laplacian-trace preservation and GNN transferability. The approach provides a practical pathway to scalable, expressive GNNs on large, homophilic graphs and connects to leverage-score concepts and graph sparsification for potential broader impact.
Abstract
Graph Neural Networks (GNNs) excel in many graph machine learning tasks but face challenges when scaling to large networks. GNN transferability allows training on smaller graphs and applying the model to larger ones, but existing methods often rely on random subsampling, leading to disconnected subgraphs and reduced model expressivity. We propose a novel graph sampling algorithm that leverages feature homophily to preserve graph structure. By minimizing the trace of the data correlation matrix, our method better preserves the graph Laplacian trace -- a proxy for the graph connectivity -- than random sampling, while achieving lower complexity than spectral methods. Experiments on citation networks show improved performance in preserving Laplacian trace and GNN transferability compared to random sampling.
