Table of Contents
Fetching ...

AGS-GNN: Attribute-guided Sampling for Graph Neural Networks

Siddhartha Shankar Das, S M Ferdous, Mahantesh M Halappanavar, Edoardo Serra, Alex Pothen

TL;DR

AGS-GNN addresses the cubic challenge of heterophily in graph learning by introducing attribute-guided sampling with two complementary channels: similarity-based sampling to reinforce local homophily and diversity-based sampling (via submodular gains) to promote class diversity in heterophilic neighborhoods. A pre-computed, parallelizable probability distribution (with options for a learnable similarity model) enables scalable, inductive training when integrated with existing GNNs. Empirical results across 35 datasets show that AGS-GNN achieves competitive accuracy on heterophilic graphs and often surpasses full-graph baselines, while converging faster than random neighborhood sampling. The framework is flexible, compatible with node- and graph-sampling pipelines, and extends existing GNNs with a principled sampling paradigm to better handle real-world graphs that mix local homophily and heterophily.

Abstract

We propose AGS-GNN, a novel attribute-guided sampling algorithm for Graph Neural Networks (GNNs) that exploits node features and connectivity structure of a graph while simultaneously adapting for both homophily and heterophily in graphs. (In homophilic graphs vertices of the same class are more likely to be connected, and vertices of different classes tend to be linked in heterophilic graphs.) While GNNs have been successfully applied to homophilic graphs, their application to heterophilic graphs remains challenging. The best-performing GNNs for heterophilic graphs do not fit the sampling paradigm, suffer high computational costs, and are not inductive. We employ samplers based on feature-similarity and feature-diversity to select subsets of neighbors for a node, and adaptively capture information from homophilic and heterophilic neighborhoods using dual channels. Currently, AGS-GNN is the only algorithm that we know of that explicitly controls homophily in the sampled subgraph through similar and diverse neighborhood samples. For diverse neighborhood sampling, we employ submodularity, which was not used in this context prior to our work. The sampling distribution is pre-computed and highly parallel, achieving the desired scalability. Using an extensive dataset consisting of 35 small ($\le$ 100K nodes) and large (>100K nodes) homophilic and heterophilic graphs, we demonstrate the superiority of AGS-GNN compare to the current approaches in the literature. AGS-GNN achieves comparable test accuracy to the best-performing heterophilic GNNs, even outperforming methods using the entire graph for node classification. AGS-GNN also converges faster compared to methods that sample neighborhoods randomly, and can be incorporated into existing GNN models that employ node or graph sampling.

AGS-GNN: Attribute-guided Sampling for Graph Neural Networks

TL;DR

AGS-GNN addresses the cubic challenge of heterophily in graph learning by introducing attribute-guided sampling with two complementary channels: similarity-based sampling to reinforce local homophily and diversity-based sampling (via submodular gains) to promote class diversity in heterophilic neighborhoods. A pre-computed, parallelizable probability distribution (with options for a learnable similarity model) enables scalable, inductive training when integrated with existing GNNs. Empirical results across 35 datasets show that AGS-GNN achieves competitive accuracy on heterophilic graphs and often surpasses full-graph baselines, while converging faster than random neighborhood sampling. The framework is flexible, compatible with node- and graph-sampling pipelines, and extends existing GNNs with a principled sampling paradigm to better handle real-world graphs that mix local homophily and heterophily.

Abstract

We propose AGS-GNN, a novel attribute-guided sampling algorithm for Graph Neural Networks (GNNs) that exploits node features and connectivity structure of a graph while simultaneously adapting for both homophily and heterophily in graphs. (In homophilic graphs vertices of the same class are more likely to be connected, and vertices of different classes tend to be linked in heterophilic graphs.) While GNNs have been successfully applied to homophilic graphs, their application to heterophilic graphs remains challenging. The best-performing GNNs for heterophilic graphs do not fit the sampling paradigm, suffer high computational costs, and are not inductive. We employ samplers based on feature-similarity and feature-diversity to select subsets of neighbors for a node, and adaptively capture information from homophilic and heterophilic neighborhoods using dual channels. Currently, AGS-GNN is the only algorithm that we know of that explicitly controls homophily in the sampled subgraph through similar and diverse neighborhood samples. For diverse neighborhood sampling, we employ submodularity, which was not used in this context prior to our work. The sampling distribution is pre-computed and highly parallel, achieving the desired scalability. Using an extensive dataset consisting of 35 small ( 100K nodes) and large (>100K nodes) homophilic and heterophilic graphs, we demonstrate the superiority of AGS-GNN compare to the current approaches in the literature. AGS-GNN achieves comparable test accuracy to the best-performing heterophilic GNNs, even outperforming methods using the entire graph for node classification. AGS-GNN also converges faster compared to methods that sample neighborhoods randomly, and can be incorporated into existing GNN models that employ node or graph sampling.
Paper Structure (53 sections, 4 theorems, 15 equations, 14 figures, 17 tables, 8 algorithms)

This paper contains 53 sections, 4 theorems, 15 equations, 14 figures, 17 tables, 8 algorithms.

Key Result

lemma 1

If the probability of selecting a neighboring node is proportional to its similarity to the ego node $t$, the local node homophily of sampled neighborhood ${\mathcal{H}}_n'(t)\ge {\mathcal{H}}_n(t)$. If the sampling probability distribution is ${\mathcal{S}}$, then $P_{\mathcal{S}}(y_i=y_t)\ge P_{\m

Figures (14)

  • Figure 1: The distribution of local node homophily in a homophilic and a heterophilic graph. Figs. \ref{['fig:homophily_distributions_1']} and \ref{['fig:homophily_distributions_2']} in Appendix \ref{['subsec:distribution']} show this for all datasets.
  • Figure 2: $F_1$ Score comparison of GSAGE and ACM-GCN on synthetic graphs generated from Squirrel (a) and Chameleon (b) datasets with varying node homophily.
  • Figure 3: AGS-GNN framework with Node Sampling. a) Pre-computation step to rank the neighbor of vertices. b) Demonstrates how weighted node sampling is performed based on the selection probabilities of ranked neighbors.
  • Figure 4: Computation graph with sample size $k=2$ and hop-size $2$. a), b) samples from similarity and diversity ranking for a single channel, c) dual channel with combined representation at the target node, and d) similar and diverse weighted samples at each sampled node.
  • Figure 5: Performance Profile: The X-axis shows the differences in $F_1$-scores (scaled to 100) between the best algorithm for a specific problem, and the Y-axis shows the fraction of the problems. We compare AGS to two scalable homophilic GNNs (GSAGE, GSAINT) and two heterophilic GNNs (ACM-GCN, LINKX). For small ($<100K$ vertices) and large ($\ge100K$ vertices) graphs, we consider LINKX (full-batch) and LINKX$+$ (mini-batch), respectively. Full results are in Appendix \ref{['subsec:AGSNSresults']}.
  • ...and 9 more figures

Theorems & Definitions (4)

  • lemma 1
  • lemma 2
  • lemma 3
  • lemma 4