Structure-Guided Input Graph for GNNs facing Heterophily
Victor M. Tenorio, Madeline Navarro, Samuel Rey, Santiago Segarra, Antonio G. Marques
TL;DR
This paper tackles heterophily in graph neural networks by building structure-guided representations: it constructs two KNN graphs based on structural features ${\mathbf Z}_{\mathrm{role}}$ and ${\mathbf Z}_{\mathrm{global}}$, and uses an adaptive fusion mechanism to combine these with the original graph. The method introduces a multi-graph GNN (SG-GNN) that learns per-graph embeddings ${\mathbf Z}_i$ and fusion weights $\alpha_i$ with $\sum_i \alpha_i = 1$, optionally enabling node-specific coefficients $\boldsymbol{\alpha}_i$ so that each node selects the most informative graph. Experiments on six heterophilic datasets show that structure-based graphs are more homophilic (and thus better suited for low-pass GNNs), and that SG-GNN consistently outperforms the best single-input graph. The approach offers interpretable insights into which graph structure drives predictions and provides a flexible framework for robust node classification under heterophily. Key equations include $TV({\mathbf y}) = \| {\mathbf y} - {\mathbf A}{\mathbf y} \|_1$, $h_{edge} = \frac{|\{ (i,j) \in \mathcal{E} : y_i = y_j \}|}{|\mathcal{E}|}$, and per-node fusion constraints $\sum_i [\alpha_i]_n = 1$.
Abstract
Graph Neural Networks (GNNs) have emerged as a promising tool to handle data exhibiting an irregular structure. However, most GNN architectures perform well on homophilic datasets, where the labels of neighboring nodes are likely to be the same. In recent years, an increasing body of work has been devoted to the development of GNN architectures for heterophilic datasets, where labels do not exhibit this low-pass behavior. In this work, we create a new graph in which nodes are connected if they share structural characteristics, meaning a higher chance of sharing their labels, and then use this new graph in the GNN architecture. To do this, we compute the k-nearest neighbors graph according to distances between structural features, which are either (i) role-based, such as degree, or (ii) global, such as centrality measures. Experiments show that the labels are smoother in this newly defined graph and that the performance of GNN architectures improves when using this alternative structure.
