Masked Graph Autoencoder with Non-discrete Bandwidths

Ziwen Zhao; Yuhua Li; Yixiong Zou; Jiliang Tang; Ruixuan Li

Masked Graph Autoencoder with Non-discrete Bandwidths

Ziwen Zhao, Yuhua Li, Yixiong Zou, Jiliang Tang, Ruixuan Li

TL;DR

This work tackles the limited topological informativeness of discrete TopoRec methods by introducing Bandana, a masked graph autoencoder with continuous bandwidth masks sampled from a Boltzmann-Gibbs distribution and a layer-wise bandwidth prediction objective. The authors show that continuous bandwidths preserve global graph connectivity and enable fine-grained neighborhood discrimination, while linking the training objective to regularized denoising in a topological space; they further reinterpret bandwidth prediction as gradient optimization of the topological encoding distribution. Empirically, Bandana outperforms representative baselines on self-supervised link prediction and node classification across a broad set of datasets, and the dot-product probing evaluation provides a fair assessment of encoder quality. The approach offers a new paradigm for structure-learning pretext tasks in graph SSL, with theoretical grounding and practical benefits for topology-aware representation learning. Bandana thus advances topological learning by moving away from discrete masking toward informative, continuous masking and prediction within a principled topological framework.

Abstract

Masked graph autoencoders have emerged as a powerful graph self-supervised learning method that has yet to be fully explored. In this paper, we unveil that the existing discrete edge masking and binary link reconstruction strategies are insufficient to learn topologically informative representations, from the perspective of message propagation on graph neural networks. These limitations include blocking message flows, vulnerability to over-smoothness, and suboptimal neighborhood discriminability. Inspired by these understandings, we explore non-discrete edge masks, which are sampled from a continuous and dispersive probability distribution instead of the discrete Bernoulli distribution. These masks restrict the amount of output messages for each edge, referred to as "bandwidths". We propose a novel, informative, and effective topological masked graph autoencoder using bandwidth masking and a layer-wise bandwidth prediction objective. We demonstrate its powerful graph topological learning ability both theoretically and empirically. Our proposed framework outperforms representative baselines in both self-supervised link prediction (improving the discrete edge reconstructors by at most 20%) and node classification on numerous datasets, solely with a structure-learning pretext. Our implementation is available at https://github.com/Newiz430/Bandana.

Masked Graph Autoencoder with Non-discrete Bandwidths

TL;DR

Abstract

Paper Structure (48 sections, 4 theorems, 15 equations, 8 figures, 12 tables)

This paper contains 48 sections, 4 theorems, 15 equations, 8 figures, 12 tables.

Introduction
Related work
Preliminaries
Notations and Concepts
TopoRec
A Message Propagation View of TopoRecs
Global uninformativeness: blocked message flows.
Local uninformativeness: indiscriminative neighborhood.
Bandana
Bandwidth Masking and Prediction Pipeline
Continuous bandwidth masks.
Encoding.
Bandwidth prediction.
Layer-wise masking and prediction.
Why Are Bandwidths Informative?
...and 33 more sections

Key Result

theorem 1

Let $\mathcal{G}_i=(\mathbf{X}^{\mathcal{G}_{i}}, \mathbf{A}^{\mathcal{G}_{i}})$ be an ego-graph with $n_i\ge2$. Assume $\boldsymbol{X}_j^{\mathcal{G}_{i}} = \boldsymbol{X}_k^{\mathcal{G}_{i}}$ for $\forall j,k \in \mathcal{N}_i$. Define the ego Dirichlet Energy of $\mathcal{G}_i$ as If a connected component $\mathcal{G}_{i,m}$ of $\mathcal{G}_i$ is induced by imposing masks following the i.i.d.

Figures (8)

Figure 1: Discrete masks vs. the proposed bandwidths. (a) Traditional TopoRecs randomly mask a fixed proportion of edges and try to reconstruct them. However, messages from some neighboring nodes (e.g. the red one) as well as their predecessors will not be received by the target node (white). (b) We propose bandwidth masking and prediction, which first restricts the message propagated through each edge in varying degrees and then predicts how much it is restricted. The white node can now receive messages from every neighbor. (c) The connected component of Cora Planetoidbefore and after different masking schemes. Left: discretely masked graph breaks the connectivity of the original component, whereas right: bandwidth masked graph (where the width and grayscale of each edge denote the assigned bandwidth) keeps the original graph topology intact, so the reconstructor learns more topologically informative representations. Best viewed in color.
Figure 2: Blocked message flows. (a) A toy example: two paths are available from A to E in the pentagonal graph. Left: edge (A,E) is masked out. Message from A can only reach E at the cost of being aggregated 3 more times. Right: (C,D) is also masked out. E is now out of reach of A (as well as B and C). (b) Node classification accuracy of a GCN pre-trained by two counterparts of MaskGAE (blue, magenta) and Bandana (red) w.r.t. the network depth.
Figure 3: Entropy histograms of the edge weight distribution in ego-graphs on Cora. Blue solid lines are the Gaussian kernel density estimation curves with the red dashed lines medians.
Figure 4: Manifold learning visualizations. (a) The Swiss Roll. MaskGAE only learns suboptimal representations loosely scattered in the latent space, whereas Bandana learns a more compact surface. (b) The Two-moon. While MaskGAE does not give informative results, Bandana successfully learns the crescent-shaped topology.
Figure 5: Node embedding visualization of different masking strategies. (a) Karate Club, where each node is labeled by 1 of 4 classes (denoted by node colors). (b-d) Edges are masked by three types of masking strategies. (e-g) Latent graphs visualized by pairwise similarities of node embeddings. (h-j) The 2-dimensional embedding space constructed by t-SNE t-SNE. Colors of different insets indicate three different node groups: community group (magenta), between-community group (red), and topological equivalence group (blue).
...and 3 more figures

Theorems & Definitions (6)

theorem 1: Vulnerability of discrete TopoRecs to over-smoothing
definition 1: Bandwidth
definition 2: Topological encoding
proposition 1: Non-discrete TopoRec is a denoising autoencoder
theorem 2: Bandwidth prediction optimizes in the topological encoding space
corollary 1: Bandana is energy-based

Masked Graph Autoencoder with Non-discrete Bandwidths

TL;DR

Abstract

Masked Graph Autoencoder with Non-discrete Bandwidths

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (6)