Noise-Resilient Unsupervised Graph Representation Learning via Multi-Hop Feature Quality Estimation
Shiyuan Li, Yixin Liu, Qingfeng Chen, Geoffrey I. Webb, Shirui Pan
TL;DR
This work tackles unsupervised graph representation learning when node features are contaminated by noise. It analyzes the dual role of feature propagation as both a denoiser (low-pass filter) and a potential diffuser of noise, showing that a fixed propagation step is suboptimal across nodes. The authors propose MQE, which combines augmented multi-hop propagation on a kNN-augmented graph with a Gaussian quality model conditioned on learnable meta-representations to estimate per-hop feature quality and produce robust node embeddings; the meta-representations serve as the final node representations for downstream tasks. Extensive experiments on five real-world datasets with various feature noise patterns demonstrate MQE’s strong performance relative to baselines and its ability to approximate per-node noise intensity via the learned quality parameters. Overall, MQE offers a practical path to robust, unsupervised graph representations in realistic noisy settings and provides insights into node-level feature quality through variance-based estimation.
Abstract
Unsupervised graph representation learning (UGRL) based on graph neural networks (GNNs), has received increasing attention owing to its efficacy in handling graph-structured data. However, existing UGRL methods ideally assume that the node features are noise-free, which makes them fail to distinguish between useful information and noise when applied to real data with noisy features, thus affecting the quality of learned representations. This urges us to take node noisy features into account in real-world UGRL. With empirical analysis, we reveal that feature propagation, the essential operation in GNNs, acts as a "double-edged sword" in handling noisy features - it can both denoise and diffuse noise, leading to varying feature quality across nodes, even within the same node at different hops. Building on this insight, we propose a novel UGRL method based on Multi-hop feature Quality Estimation (MQE for short). Unlike most UGRL models that directly utilize propagation-based GNNs to generate representations, our approach aims to learn representations through estimating the quality of propagated features at different hops. Specifically, we introduce a Gaussian model that utilizes a learnable "meta-representation" as a condition to estimate the expectation and variance of multi-hop propagated features via neural networks. In this way, the "meta representation" captures the semantic and structural information underlying multiple propagated features but is naturally less susceptible to interference by noise, thereby serving as high-quality node representations beneficial for downstream tasks. Extensive experiments on multiple real-world datasets demonstrate that MQE in learning reliable node representations in scenarios with diverse types of feature noise.
