Disambiguated Node Classification with Graph Neural Networks
Tianxiang Zhao, Xiang Zhang, Suhang Wang
TL;DR
Ambiguous, underrepresented regions in graphs cause GNNs to underperform due to the inductive bias of message passing. The authors introduce DisamGCL, which automatically identifies ambiguous nodes via temporal inconsistency in predictions and applies a topology-aware contrastive regularization with a joint objective $\mathcal{L}_{ce}+\lambda\mathcal{L}_{cs}$. They provide a memory-based ambiguity score and a JS-divergence contrastive objective with augmented positives/negatives, validated across six datasets and multiple backbones, yielding substantial gains especially for minority classes. The approach enhances the robustness and generalization of GNNs in heterogeneous graphs and offers a scalable path for integrating with other self-supervised strategies, with code made available for reproducibility.
Abstract
Graph Neural Networks (GNNs) have demonstrated significant success in learning from graph-structured data across various domains. Despite their great successful, one critical challenge is often overlooked by existing works, i.e., the learning of message propagation that can generalize effectively to underrepresented graph regions. These minority regions often exhibit irregular homophily/heterophily patterns and diverse neighborhood class distributions, resulting in ambiguity. In this work, we investigate the ambiguity problem within GNNs, its impact on representation learning, and the development of richer supervision signals to fight against this problem. We conduct a fine-grained evaluation of GNN, analyzing the existence of ambiguity in different graph regions and its relation with node positions. To disambiguate node embeddings, we propose a novel method, {\method}, which exploits additional optimization guidance to enhance representation learning, particularly for nodes in ambiguous regions. {\method} identifies ambiguous nodes based on temporal inconsistency of predictions and introduces a disambiguation regularization by employing contrastive learning in a topology-aware manner. {\method} promotes discriminativity of node representations and can alleviating semantic mixing caused by message propagation, effectively addressing the ambiguity problem. Empirical results validate the efficiency of {\method} and highlight its potential to improve GNN performance in underrepresented graph regions.
