Table of Contents
Fetching ...

Noise-Resilient Unsupervised Graph Representation Learning via Multi-Hop Feature Quality Estimation

Shiyuan Li, Yixin Liu, Qingfeng Chen, Geoffrey I. Webb, Shirui Pan

TL;DR

This work tackles unsupervised graph representation learning when node features are contaminated by noise. It analyzes the dual role of feature propagation as both a denoiser (low-pass filter) and a potential diffuser of noise, showing that a fixed propagation step is suboptimal across nodes. The authors propose MQE, which combines augmented multi-hop propagation on a kNN-augmented graph with a Gaussian quality model conditioned on learnable meta-representations to estimate per-hop feature quality and produce robust node embeddings; the meta-representations serve as the final node representations for downstream tasks. Extensive experiments on five real-world datasets with various feature noise patterns demonstrate MQE’s strong performance relative to baselines and its ability to approximate per-node noise intensity via the learned quality parameters. Overall, MQE offers a practical path to robust, unsupervised graph representations in realistic noisy settings and provides insights into node-level feature quality through variance-based estimation.

Abstract

Unsupervised graph representation learning (UGRL) based on graph neural networks (GNNs), has received increasing attention owing to its efficacy in handling graph-structured data. However, existing UGRL methods ideally assume that the node features are noise-free, which makes them fail to distinguish between useful information and noise when applied to real data with noisy features, thus affecting the quality of learned representations. This urges us to take node noisy features into account in real-world UGRL. With empirical analysis, we reveal that feature propagation, the essential operation in GNNs, acts as a "double-edged sword" in handling noisy features - it can both denoise and diffuse noise, leading to varying feature quality across nodes, even within the same node at different hops. Building on this insight, we propose a novel UGRL method based on Multi-hop feature Quality Estimation (MQE for short). Unlike most UGRL models that directly utilize propagation-based GNNs to generate representations, our approach aims to learn representations through estimating the quality of propagated features at different hops. Specifically, we introduce a Gaussian model that utilizes a learnable "meta-representation" as a condition to estimate the expectation and variance of multi-hop propagated features via neural networks. In this way, the "meta representation" captures the semantic and structural information underlying multiple propagated features but is naturally less susceptible to interference by noise, thereby serving as high-quality node representations beneficial for downstream tasks. Extensive experiments on multiple real-world datasets demonstrate that MQE in learning reliable node representations in scenarios with diverse types of feature noise.

Noise-Resilient Unsupervised Graph Representation Learning via Multi-Hop Feature Quality Estimation

TL;DR

This work tackles unsupervised graph representation learning when node features are contaminated by noise. It analyzes the dual role of feature propagation as both a denoiser (low-pass filter) and a potential diffuser of noise, showing that a fixed propagation step is suboptimal across nodes. The authors propose MQE, which combines augmented multi-hop propagation on a kNN-augmented graph with a Gaussian quality model conditioned on learnable meta-representations to estimate per-hop feature quality and produce robust node embeddings; the meta-representations serve as the final node representations for downstream tasks. Extensive experiments on five real-world datasets with various feature noise patterns demonstrate MQE’s strong performance relative to baselines and its ability to approximate per-node noise intensity via the learned quality parameters. Overall, MQE offers a practical path to robust, unsupervised graph representations in realistic noisy settings and provides insights into node-level feature quality through variance-based estimation.

Abstract

Unsupervised graph representation learning (UGRL) based on graph neural networks (GNNs), has received increasing attention owing to its efficacy in handling graph-structured data. However, existing UGRL methods ideally assume that the node features are noise-free, which makes them fail to distinguish between useful information and noise when applied to real data with noisy features, thus affecting the quality of learned representations. This urges us to take node noisy features into account in real-world UGRL. With empirical analysis, we reveal that feature propagation, the essential operation in GNNs, acts as a "double-edged sword" in handling noisy features - it can both denoise and diffuse noise, leading to varying feature quality across nodes, even within the same node at different hops. Building on this insight, we propose a novel UGRL method based on Multi-hop feature Quality Estimation (MQE for short). Unlike most UGRL models that directly utilize propagation-based GNNs to generate representations, our approach aims to learn representations through estimating the quality of propagated features at different hops. Specifically, we introduce a Gaussian model that utilizes a learnable "meta-representation" as a condition to estimate the expectation and variance of multi-hop propagated features via neural networks. In this way, the "meta representation" captures the semantic and structural information underlying multiple propagated features but is naturally less susceptible to interference by noise, thereby serving as high-quality node representations beneficial for downstream tasks. Extensive experiments on multiple real-world datasets demonstrate that MQE in learning reliable node representations in scenarios with diverse types of feature noise.
Paper Structure (17 sections, 6 equations, 8 figures, 2 tables)

This paper contains 17 sections, 6 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: t-SNE van2008visualizing visualization of propagated features by symmetric normalized adjacency matrix of Cora dataset. (a) original features; (b)/(e) features after perturbation by noise; (c),(f)/(d),(g) noisy features after 2/16-step propagation.
  • Figure 2: A toy example to illustrate the optimal propagation steps for different nodes. The color darkness of each node indicates its noise intensity (clean-lightly noisy-heavily noisy). For node A, a 1-step propagation that aggregates its 1-hop messages is beneficial for feature denoising. For node B, a larger number of propagation steps (e.g., 3) is preferred to generate a reliable representation by aggregating more neighboring information.
  • Figure 3: The performance of UGRL models with default propagation step, moderate propagation (with suffix "M" and larger propagation step (with suffix "L") on the Cora dataset with different noise levels and types.
  • Figure 4: The distribution of optimal propagation step for each node on Cora dataset with different noise types.
  • Figure 5: The pipeline of MQE. First, a kNN-based graph structure augmentation is conducted to generate the augmented adjacency matrix $\mathbf{A}^*$. Then, the propagated features are acquired by augmented multi-hop propagation with $\mathbf{A}^*$. Afterward, in propagated feature quality estimation, we take meta representations $\mathbf{Z}$ as the condition to estimate the mean $\mu$ and standard deviation $\sigma$ of the distribution of propagated features. $\mathbf{Z}$, finally, serves as the learned representations for downstream tasks.
  • ...and 3 more figures