Table of Contents
Fetching ...

Hierarchical Representation Learning in Graph Neural Networks with Node Decimation Pooling

Filippo Maria Bianchi, Daniele Grattarola, Lorenzo Livi, Cesare Alippi

TL;DR

The paper tackles efficient hierarchical representation learning in GNNs by introducing Node Decimation Pooling (NDP), a topological pooling operator that pre-computes a pyramid of coarsened graphs offline. NDP combines a MAXCUT-based spectral node decimation, Kron reduction to build coarsened graphs, and a sparsification step to control density, enabling fast, scalable pooling across multiple GNN layers. The authors provide theoretical analyses of MAXCUT approximations, spectral preservation under sparsification, and practical implementation details, showing that NDP offers competitive accuracy with significantly reduced computation and memory compared to both topological and feature-based pooling methods. Empirical results on graph classification and graph signal classification demonstrate that NDP achieves higher or comparable accuracy while training faster than alternatives, with particularly strong performance on tasks like MNIST graph signals where purely topological methods excel. These findings suggest that pre-computed, topology-focused pooling can be a robust and efficient choice for deploying GNNs in resource-constrained settings.

Abstract

In graph neural networks (GNNs), pooling operators compute local summaries of input graphs to capture their global properties, and they are fundamental for building deep GNNs that learn hierarchical representations. In this work, we propose the Node Decimation Pooling (NDP), a pooling operator for GNNs that generates coarser graphs while preserving the overall graph topology. During training, the GNN learns new node representations and fits them to a pyramid of coarsened graphs, which is computed offline in a pre-processing stage. NDP consists of three steps. First, a node decimation procedure selects the nodes belonging to one side of the partition identified by a spectral algorithm that approximates the \maxcut{} solution. Afterwards, the selected nodes are connected with Kron reduction to form the coarsened graph. Finally, since the resulting graph is very dense, we apply a sparsification procedure that prunes the adjacency matrix of the coarsened graph to reduce the computational cost in the GNN. Notably, we show that it is possible to remove many edges without significantly altering the graph structure. Experimental results show that NDP is more efficient compared to state-of-the-art graph pooling operators while reaching, at the same time, competitive performance on a significant variety of graph classification tasks.

Hierarchical Representation Learning in Graph Neural Networks with Node Decimation Pooling

TL;DR

The paper tackles efficient hierarchical representation learning in GNNs by introducing Node Decimation Pooling (NDP), a topological pooling operator that pre-computes a pyramid of coarsened graphs offline. NDP combines a MAXCUT-based spectral node decimation, Kron reduction to build coarsened graphs, and a sparsification step to control density, enabling fast, scalable pooling across multiple GNN layers. The authors provide theoretical analyses of MAXCUT approximations, spectral preservation under sparsification, and practical implementation details, showing that NDP offers competitive accuracy with significantly reduced computation and memory compared to both topological and feature-based pooling methods. Empirical results on graph classification and graph signal classification demonstrate that NDP achieves higher or comparable accuracy while training faster than alternatives, with particularly strong performance on tasks like MNIST graph signals where purely topological methods excel. These findings suggest that pre-computed, topology-focused pooling can be a robust and efficient choice for deploying GNNs in resource-constrained settings.

Abstract

In graph neural networks (GNNs), pooling operators compute local summaries of input graphs to capture their global properties, and they are fundamental for building deep GNNs that learn hierarchical representations. In this work, we propose the Node Decimation Pooling (NDP), a pooling operator for GNNs that generates coarser graphs while preserving the overall graph topology. During training, the GNN learns new node representations and fits them to a pyramid of coarsened graphs, which is computed offline in a pre-processing stage. NDP consists of three steps. First, a node decimation procedure selects the nodes belonging to one side of the partition identified by a spectral algorithm that approximates the \maxcut{} solution. Afterwards, the selected nodes are connected with Kron reduction to form the coarsened graph. Finally, since the resulting graph is very dense, we apply a sparsification procedure that prunes the adjacency matrix of the coarsened graph to reduce the computational cost in the GNN. Notably, we show that it is possible to remove many edges without significantly altering the graph structure. Experimental results show that NDP is more efficient compared to state-of-the-art graph pooling operators while reaching, at the same time, competitive performance on a significant variety of graph classification tasks.

Paper Structure

This paper contains 29 sections, 1 theorem, 26 equations, 15 figures, 4 tables, 2 algorithms.

Key Result

Theorem 1

Let $\mathbf{Q}$ be a matrix used to remove small values in the adjacency matrix ${\mathbf A}$, which is defined as Each eigenvalue $\bar{\alpha}_i$ of the sparsified adjacency matrix $\mathbf{\bar{{\mathbf A}}} = {\mathbf A} + \mathbf{Q}$ is bounded by where $\alpha_i$ and ${\mathbf u}_i$ are eigenvalue-eigenvector pairs of ${\mathbf A}$.

Figures (15)

  • Figure 1: Depiction of the proposed graph coarsening procedure. First, the nodes are partitioned in two sets according to a MAXCUT objective and then are decimated by dropping one of the two sets ($\mathcal{V}^{-}$). Then, a coarsened Laplacian is built by connecting the remaining nodes with a graph reduction procedure. Finally, the edges with low weights in the new adjacency matrix obtained from the coarsened Laplacian are dropped to make the resulting graph sparser.
  • Figure 2: This example shows how it is possible to skip some MP operations on intermediate levels of the pyramid of coarsened graphs. Such a procedure shares analogies with pooling with a larger stride in traditional CNNs and can be considered as a higher-order graph pooling. After the first MP operation on ${\mathbf A}^{(0)}$, the node features are pooled by applying in cascade 3 decimation matrices, ${\mathbf S}^{(0)}$, ${\mathbf S}^{(1)}$, and ${\mathbf S}^{(2)}$. Afterwards, it is possible to directly perform a MP operation on ${\mathbf A}^{(3)}$, skipping the MP operations on ${\mathbf A}^{(1)}$ and ${\mathbf A}^{(2)}$.
  • Figure 3: (Left) distribution and values assumed by ${\mathbf v}_\text{max}$. (Right) distribution and values assumed by ${\mathbf v}_\text{max}$. The entries of the eigenvectors are sorted by node degree. A Stochastic Block Model graph was used in this example.
  • Figure 4: Blue line: fraction of edges cut by the partition yield by the spectral algorithm. Orange line: fraction of edges removed by a random cut. Green line: the MAXCUT upper bound as a function of the largest eigenvalue $\lambda^s_\text{max}$ of the symmetric Laplacian. Black line: the threshold from trevisan2012max indicating the value of $\lambda^s_\text{max}/2$ below which one should switch to the random cut to obtain a solution guaranteed to be $\geq 0.53\cdot$MAXCUT. The x-axis indicates the density of the graph connectivity, which increases by randomly adding edges.
  • Figure 5: Top-left: Spectrum of the Laplacians associated with the original adjacency ${\mathbf A}^{(0)}$ and the coarsened versions ${\mathbf A}^{(1)}$, ${\mathbf A}^{(2)}$, and ${\mathbf A}^{(3)}$ obtained with the NDP algorithm. Top-right: Spectrum of the Laplacians associated with the sparsified adjacency matrices $\bar{{\mathbf A}}^{(1)}$, $\bar{{\mathbf A}}^{(2)}$, and $\bar{{\mathbf A}}^{(3)}$. Bottom: Absolute difference between the spectra of the Laplacians.
  • ...and 10 more figures

Theorems & Definitions (2)

  • Theorem 1
  • proof