NoiseHGNN: Synthesized Similarity Graph-Based Neural Network For Noised Heterogeneous Graph Representation Learning
Xiong Zhang, Cheng Xie, Haoran Duan, Beibei Yu
TL;DR
NoiseHGNN tackles the problem of learning robust representations from noised heterogeneous graphs by introducing a synthesized similarity graph and a similarity-aware encoder. It eschews direct correction of noisy edges, instead leveraging a meta-path-target contrastive mechanism to align the noised and synthesized graphs with shared parameterization. The approach combines a graph synthesizer, graph augmentation, a similarity-aware HGNN encoder, and a contrastive learning objective, achieving state-of-the-art results on most benchmark datasets under edge-noise perturbations. Empirically, NoiseHGNN demonstrates significant gains in Macro-F1 and Micro-F1 across multiple datasets and shows robustness to varying noise levels, with practical implications for real-world heterogeneous graph tasks such as academic, social, and knowledge graphs.
Abstract
Real-world graph data environments intrinsically exist noise (e.g., link and structure errors) that inevitably disturb the effectiveness of graph representation and downstream learning tasks. For homogeneous graphs, the latest works use original node features to synthesize a similarity graph that can correct the structure of the noised graph. This idea is based on the homogeneity assumption, which states that similar nodes in the homogeneous graph tend to have direct links in the original graph. However, similar nodes in heterogeneous graphs usually do not have direct links, which can not be used to correct the original noise graph. This causes a significant challenge in noised heterogeneous graph learning. To this end, this paper proposes a novel synthesized similarity-based graph neural network compatible with noised heterogeneous graph learning. First, we calculate the original feature similarities of all nodes to synthesize a similarity-based high-order graph. Second, we propose a similarity-aware encoder to embed original and synthesized graphs with shared parameters. Then, instead of graph-to-graph supervising, we synchronously supervise the original and synthesized graph embeddings to predict the same labels. Meanwhile, a target-based graph extracted from the synthesized graph contrasts the structure of the metapath-based graph extracted from the original graph to learn the mutual information. Extensive experiments in numerous real-world datasets show the proposed method achieves state-of-the-art records in the noised heterogeneous graph learning tasks. In highlights, +5$\sim$6\% improvements are observed in several noised datasets compared with previous SOTA methods. The code and datasets are available at https://github.com/kg-cc/NoiseHGNN.
