HypeBoy: Generative Self-Supervised Representation Learning on Hypergraphs
Sunwoo Kim, Shinhwan Kang, Fanchen Bu, Soo Yong Lee, Jaemin Yoo, Kijung Shin
TL;DR
This paper tackles representation learning on hypergraphs by introducing hyperedge filling, a generative SSL task that leverages higher-order topology while avoiding the combinatorial explosion of unseen hyperedges. The authors propose HypeBoy, a hypergraph SSL method with augmentation, a hypergraph encoder, and projection heads trained via a hyperedge filling loss, augmented by a two-stage warm-up to prevent dimensional collapse. Theoretical analysis links the task to improved node classification, and extensive experiments across 11 datasets show that HypeBoy outperforms both graph-based SSL methods adapted to hypergraphs and existing hypergraph SSL methods, in both pre-training and linear evaluation settings. The work demonstrates the value of topology-centric generative signals for robust, general-purpose hypergraph representations with practical implications for label-scarce domains.
Abstract
Hypergraphs are marked by complex topology, expressing higher-order interactions among multiple nodes with hyperedges, and better capturing the topology is essential for effective representation learning. Recent advances in generative self-supervised learning (SSL) suggest that hypergraph neural networks learned from generative self supervision have the potential to effectively encode the complex hypergraph topology. Designing a generative SSL strategy for hypergraphs, however, is not straightforward. Questions remain with regard to its generative SSL task, connection to downstream tasks, and empirical properties of learned representations. In light of the promises and challenges, we propose a novel generative SSL strategy for hypergraphs. We first formulate a generative SSL task on hypergraphs, hyperedge filling, and highlight its theoretical connection to node classification. Based on the generative SSL task, we propose a hypergraph SSL method, HypeBoy. HypeBoy learns effective general-purpose hypergraph representations, outperforming 16 baseline methods across 11 benchmark datasets.
