AGS-GNN: Attribute-guided Sampling for Graph Neural Networks
Siddhartha Shankar Das, S M Ferdous, Mahantesh M Halappanavar, Edoardo Serra, Alex Pothen
TL;DR
AGS-GNN addresses the cubic challenge of heterophily in graph learning by introducing attribute-guided sampling with two complementary channels: similarity-based sampling to reinforce local homophily and diversity-based sampling (via submodular gains) to promote class diversity in heterophilic neighborhoods. A pre-computed, parallelizable probability distribution (with options for a learnable similarity model) enables scalable, inductive training when integrated with existing GNNs. Empirical results across 35 datasets show that AGS-GNN achieves competitive accuracy on heterophilic graphs and often surpasses full-graph baselines, while converging faster than random neighborhood sampling. The framework is flexible, compatible with node- and graph-sampling pipelines, and extends existing GNNs with a principled sampling paradigm to better handle real-world graphs that mix local homophily and heterophily.
Abstract
We propose AGS-GNN, a novel attribute-guided sampling algorithm for Graph Neural Networks (GNNs) that exploits node features and connectivity structure of a graph while simultaneously adapting for both homophily and heterophily in graphs. (In homophilic graphs vertices of the same class are more likely to be connected, and vertices of different classes tend to be linked in heterophilic graphs.) While GNNs have been successfully applied to homophilic graphs, their application to heterophilic graphs remains challenging. The best-performing GNNs for heterophilic graphs do not fit the sampling paradigm, suffer high computational costs, and are not inductive. We employ samplers based on feature-similarity and feature-diversity to select subsets of neighbors for a node, and adaptively capture information from homophilic and heterophilic neighborhoods using dual channels. Currently, AGS-GNN is the only algorithm that we know of that explicitly controls homophily in the sampled subgraph through similar and diverse neighborhood samples. For diverse neighborhood sampling, we employ submodularity, which was not used in this context prior to our work. The sampling distribution is pre-computed and highly parallel, achieving the desired scalability. Using an extensive dataset consisting of 35 small ($\le$ 100K nodes) and large (>100K nodes) homophilic and heterophilic graphs, we demonstrate the superiority of AGS-GNN compare to the current approaches in the literature. AGS-GNN achieves comparable test accuracy to the best-performing heterophilic GNNs, even outperforming methods using the entire graph for node classification. AGS-GNN also converges faster compared to methods that sample neighborhoods randomly, and can be incorporated into existing GNN models that employ node or graph sampling.
