Table of Contents
Fetching ...

Revisiting Graph Neural Networks: All We Have is Low-Pass Filters

Hoang NT, Takanori Maehara

TL;DR

The paper investigates why graph neural networks work for vertex classification by framing GNNs within graph signal processing and identifying a prevailing low-frequency assumption: useful information resides in low-frequency components of graph signals. It shows that graph filtering via adjacency or Laplacian-based operators acts as a low-pass mechanism, and that the observed performance can be explained by denoising rather than non-linear manifold learning. A new baseline, gfNN, is proposed, which filters features using graph operators and then learns with a standard model, providing speed and noise-robustness advantages while preserving performance under the low-frequency regime. The work further analyzes limitations of common baselines like SGC in nonlinear settings and offers practical guidance on when to rely on graph filters versus deeper graph architectures.

Abstract

Graph neural networks have become one of the most important techniques to solve machine learning problems on graph-structured data. Recent work on vertex classification proposed deep and distributed learning models to achieve high performance and scalability. However, we find that the feature vectors of benchmark datasets are already quite informative for the classification task, and the graph structure only provides a means to denoise the data. In this paper, we develop a theoretical framework based on graph signal processing for analyzing graph neural networks. Our results indicate that graph neural networks only perform low-pass filtering on feature vectors and do not have the non-linear manifold learning property. We further investigate their resilience to feature noise and propose some insights on GCN-based graph neural network design.

Revisiting Graph Neural Networks: All We Have is Low-Pass Filters

TL;DR

The paper investigates why graph neural networks work for vertex classification by framing GNNs within graph signal processing and identifying a prevailing low-frequency assumption: useful information resides in low-frequency components of graph signals. It shows that graph filtering via adjacency or Laplacian-based operators acts as a low-pass mechanism, and that the observed performance can be explained by denoising rather than non-linear manifold learning. A new baseline, gfNN, is proposed, which filters features using graph operators and then learns with a standard model, providing speed and noise-robustness advantages while preserving performance under the low-frequency regime. The work further analyzes limitations of common baselines like SGC in nonlinear settings and offers practical guidance on when to rely on graph filters versus deeper graph architectures.

Abstract

Graph neural networks have become one of the most important techniques to solve machine learning problems on graph-structured data. Recent work on vertex classification proposed deep and distributed learning models to achieve high performance and scalability. However, we find that the feature vectors of benchmark datasets are already quite informative for the classification task, and the graph structure only provides a means to denoise the data. In this paper, we develop a theoretical framework based on graph signal processing for analyzing graph neural networks. Our results indicate that graph neural networks only perform low-pass filtering on feature vectors and do not have the non-linear manifold learning property. We further investigate their resilience to feature noise and propose some insights on GCN-based graph neural network design.

Paper Structure

This paper contains 14 sections, 7 theorems, 22 equations, 5 figures, 2 tables.

Key Result

Theorem 2

Under Assumption asmp:low-frequency, the outcomes of SGC, GCN, and gfNN are similar to those of the corresponding NNs using true features.

Figures (5)

  • Figure 1: Accuracy by frequency components
  • Figure 2: A simple realization of gfNN
  • Figure 3:
  • Figure 4: Benchmark test accuracy on Cora (left), Citeseer (middle), and Pubmed (right) datasets. The noise level is measured by standard deviation of white noise added to the features.
  • Figure 5: Decision boundaries on 500 generated data samples following the two circles pattern

Theorems & Definitions (13)

  • Theorem 3
  • Lemma 5
  • Corollary 6
  • Theorem 7
  • Theorem 8
  • proof : Proof of Theorem \ref{['thm:spectrum-shrinking']}
  • Lemma 9
  • proof
  • proof : Proof of Lemma \ref{['lem:bias-variance']}
  • proof : Proof of Theorem \ref{['thm:MLP-vs-gfNN']}
  • ...and 3 more