CAT: A Causally Graph Attention Network for Trimming Heterophilic Graph
Silu He, Qinyao Luo, Xinsha Fu, Ling Zhao, Ronghua Du, Haifeng Li
TL;DR
CAT addresses the notable degradation of discrimination in GATs on heterophilic graphs by identifying a Distraction Effect ($DE$) caused by dissimilar neighbors and reduced central self-attention. It introduces two modules—Class-level Semantic Clustering to form a semantically meaningful space and Total Effect estimation ($TE$) to quantify neighbor-induced distraction via do-interventions—then trims Distraction Neighbors with the lowest $TE$ to yield a trimmed graph. The method is plug-and-play with any LAMP-based GAT and is validated across three base GATs on seven heterophilic datasets, showing improved node classification and clearer class separation in embedding space. Ablation and visualization experiments corroborate the core intuition of Low Distraction and High Self-attention, demonstrating that CA T increases self-attention while reducing distraction. Overall, CAT offers a causal, architecture-agnostic graph trimming approach that enhances discrimination in heterophilic settings and provides a principled framework for future graph learning with causal insights.
Abstract
Local Attention-guided Message Passing Mechanism (LAMP) adopted in Graph Attention Networks (GATs) is designed to adaptively learn the importance of neighboring nodes for better local aggregation on the graph, which can bring the representations of similar neighbors closer effectively, thus showing stronger discrimination ability. However, existing GATs suffer from a significant discrimination ability decline in heterophilic graphs because the high proportion of dissimilar neighbors can weaken the self-attention of the central node, jointly resulting in the deviation of the central node from similar nodes in the representation space. This kind of effect generated by neighboring nodes is called the Distraction Effect (DE) in this paper. To estimate and weaken the DE of neighboring nodes, we propose a Causally graph Attention network for Trimming heterophilic graph (CAT). To estimate the DE, since the DE are generated through two paths (grab the attention assigned to neighbors and reduce the self-attention of the central node), we use Total Effect to model DE, which is a kind of causal estimand and can be estimated from intervened data; To weaken the DE, we identify the neighbors with the highest DE (we call them Distraction Neighbors) and remove them. We adopt three representative GATs as the base model within the proposed CAT framework and conduct experiments on seven heterophilic datasets in three different sizes. Comparative experiments show that CAT can improve the node classification accuracy of all base GAT models. Ablation experiments and visualization further validate the enhancement of discrimination ability brought by CAT. The source code is available at https://github.com/GeoX-Lab/CAT.
