GATE: How to Keep Out Intrusive Neighbors
Nimrah Mustafa, Rebekka Burkholz
TL;DR
GATE addresses a key limitation of Graph Attention Networks: their inability to selectively switch off task-irrelevant neighborhood aggregation, which harms learning on deep GNNs and heterophilic graphs. The authors extend GAT to GATE by separating budgets for node-feature and neighborhood contributions, grounded in a gradient-flow conservation framework that enables switching off aggregation in well-trained regimes. They provide theoretical insights, a synthetic test bed, and extensive experiments showing GATE outperforms GAT and many baselines, achieving state-of-the-art results on ogb-arxiv and strong performance on heterophilic real-world data. This work demonstrates a flexible, depth-friendly approach to graph learning with interpretable attention patterns and practical impact for diverse graph-structured tasks.
Abstract
Graph Attention Networks (GATs) are designed to provide flexible neighborhood aggregation that assigns weights to neighbors according to their importance. In practice, however, GATs are often unable to switch off task-irrelevant neighborhood aggregation, as we show experimentally and analytically. To address this challenge, we propose GATE, a GAT extension that holds three major advantages: i) It alleviates over-smoothing by addressing its root cause of unnecessary neighborhood aggregation. ii) Similarly to perceptrons, it benefits from higher depth as it can still utilize additional layers for (non-)linear feature transformations in case of (nearly) switched-off neighborhood aggregation. iii) By down-weighting connections to unrelated neighbors, it often outperforms GATs on real-world heterophilic datasets. To further validate our claims, we construct a synthetic test bed to analyze a model's ability to utilize the appropriate amount of neighborhood aggregation, which could be of independent interest.
