Range-aware Positional Encoding via High-order Pretraining: Theory and Practice
Viet Anh Nguyen, Nhat Khang Ngo, Truong Son Hy
TL;DR
This work tackles data scarcity and domain dependence in graph pretraining by introducing HOPE-WavePE, a range-aware, domain-agnostic pretraining framework that learns to reconstruct high-order adjacencies from multi-resolution spectral graph wavelet signals. It combines a fast, scalable graph wavelet transform with a high-order permutation-equivariant autoencoder to produce node-level structural embeddings that generalize across tasks and domains. The approach is supported by a theoretical result guaranteeing the ability to approximate a weighted sum of multi-hop adjacencies and is validated through extensive experiments on MoleculeNet, LRGB, TU datasets, MNIST/CIFAR-10, and ZINC, showing improved performance and transferability. Overall, HOPE-WavePE provides a principled, scalable pathway toward general graph structure encoders and graph foundation models, with practical impact for molecule property prediction, materials science, and beyond.
Abstract
Unsupervised pre-training on vast amounts of graph data is critical in real-world applications wherein labeled data is limited, such as molecule properties prediction or materials science. Existing approaches pre-train models for specific graph domains, neglecting the inherent connections within networks. This limits their ability to transfer knowledge to various supervised tasks. In this work, we propose a novel pre-training strategy on graphs that focuses on modeling their multi-resolution structural information, allowing us to capture global information of the whole graph while preserving local structures around its nodes. We extend the work of Wave}let Positional Encoding (WavePE) from (Ngo et al., 2023) by pretraining a High-Order Permutation-Equivariant Autoencoder (HOPE-WavePE) to reconstruct node connectivities from their multi-resolution wavelet signals. Unlike existing positional encodings, our method is designed to become sensitivity to the input graph size in downstream tasks, which efficiently capture global structure on graphs. Since our approach relies solely on the graph structure, it is also domain-agnostic and adaptable to datasets from various domains, therefore paving the wave for developing general graph structure encoders and graph foundation models. We theoretically demonstrate that there exists a parametrization of such architecture that it can predict the output adjacency up to arbitrarily low error. We also evaluate HOPE-WavePE on graph-level prediction tasks of different areas and show its superiority compared to other methods.
