MDL-Pool: Adaptive Multilevel Graph Pooling Based on Minimum Description Length

Jan von Pichowski; Christopher Blöcker; Ingo Scholtes

MDL-Pool: Adaptive Multilevel Graph Pooling Based on Minimum Description Length

Jan von Pichowski, Christopher Blöcker, Ingo Scholtes

TL;DR

MDL-Pool introduces an adaptive multilevel graph pooling operator driven by the minimum description length principle and the multilevel map equation. By jointly optimizing hierarchical clustering across levels and enabling per-graph depth selection, it overcomes fixed-depth limitations of prior pooling methods without requiring explicit regularization. Empirical results on community detection and graph classification show competitive or state-of-the-art performance, with the model automatically inferring the number of clusters and depth per graph. This approach offers a principled, parameter-free mechanism for concise graph representations and interpretable hierarchical structure.

Abstract

Graph pooling compresses graphs and summarises their topological properties and features in a vectorial representation. It is an essential part of deep graph representation learning and is indispensable in graph-level tasks like classification or regression. Current approaches pool hierarchical structures in graphs by iteratively applying shallow pooling operators up to a fixed depth. However, they disregard the interdependencies between structures at different hierarchical levels and do not adapt to datasets that contain graphs with different sizes that may require pooling with various depths. To address these issues, we propose MDL-Pool, a pooling operator based on the minimum description length (MDL) principle, whose loss formulation explicitly models the interdependencies between different hierarchical levels and facilitates a direct comparison between multiple pooling alternatives with different depths. MDP-Pool builds on the map equation, an information-theoretic objective function for community detection, which naturally implements Occam's razor and balances between model complexity and goodness-of-fit via the MDL. We demonstrate MDL-Pool's competitive performance in an empirical evaluation against various baselines across standard graph classification datasets.

MDL-Pool: Adaptive Multilevel Graph Pooling Based on Minimum Description Length

TL;DR

Abstract

Paper Structure (37 sections, 28 equations, 6 figures, 7 tables)

This paper contains 37 sections, 28 equations, 6 figures, 7 tables.

Introduction
Related Work
The minimum description length principle:
Graph clustering:
Graph pooling:
Background
Hierarchical graph pooling
The Map Equation
Adaptive Multilevel Pooling with Map Equation Loss
Pooling Architecture
Pooling Building Block:
Hierarchical pooling:
Adaptive Depth via Minimum Description Length:
Downstream task:
Optimisation with the Multilevel Map Equation
...and 22 more sections

Figures (6)

Figure 1: (a) Generic pooling block based on SEL-RED-CON. (b) Our multilevel pooling setup, here with up to two levels. Matrix subscripts and superscripts denote the pooling depth and number of performed pooling steps, respectively. For example, $\mathbf{S}^1_2$ is the cluster assignment matrix in the 2-level pooling case after 1 pooling step. $\mathcal{L}_0 = \mathcal{H}\left(P\right)$ is the no-pooling codelength. An extension to more levels is possible due to our adaptable loss function (see \ref{['appx:map-equation-expansion']}).
Figure 2: Node assignments learned by MDL-Pool.
Figure 3: MDL-Pool tends to learn clean node assignment with little overlap between clusters.
Figure 4: Distribution of number of pooling layers for graphs (train, val, test) selected by MDL-Pool.
Figure 5: Coding principles behind the map equation. (a) All nodes are assigned to the same module and receive unique codewords, constructed with a Huffman code huffman-coding based on their ergodic visit rates. The black trace shows a possible sequence of node visits by a random walker. For each step, we use one codeword, resulting in the sequence of codewords shown at the bottom. (b) The nodes are partitioned into nested modules where colours show module memberships. With modules, we assign unique codewords within modules, but we must also define codewords for describing module entries and exits. These are shown next to the coloured arrows pointing into and out of modules. Now, a random walker step requires one, three, or five codewords, depending on how many module boundaries are crossed. With these codes, the codelength for the same sequence of nodes becomes shorter, as shown at the bottom.
...and 1 more figures

MDL-Pool: Adaptive Multilevel Graph Pooling Based on Minimum Description Length

TL;DR

Abstract

MDL-Pool: Adaptive Multilevel Graph Pooling Based on Minimum Description Length

Authors

TL;DR

Abstract

Table of Contents

Figures (6)