HypeBoy: Generative Self-Supervised Representation Learning on Hypergraphs

Sunwoo Kim; Shinhwan Kang; Fanchen Bu; Soo Yong Lee; Jaemin Yoo; Kijung Shin

HypeBoy: Generative Self-Supervised Representation Learning on Hypergraphs

Sunwoo Kim, Shinhwan Kang, Fanchen Bu, Soo Yong Lee, Jaemin Yoo, Kijung Shin

TL;DR

This paper tackles representation learning on hypergraphs by introducing hyperedge filling, a generative SSL task that leverages higher-order topology while avoiding the combinatorial explosion of unseen hyperedges. The authors propose HypeBoy, a hypergraph SSL method with augmentation, a hypergraph encoder, and projection heads trained via a hyperedge filling loss, augmented by a two-stage warm-up to prevent dimensional collapse. Theoretical analysis links the task to improved node classification, and extensive experiments across 11 datasets show that HypeBoy outperforms both graph-based SSL methods adapted to hypergraphs and existing hypergraph SSL methods, in both pre-training and linear evaluation settings. The work demonstrates the value of topology-centric generative signals for robust, general-purpose hypergraph representations with practical implications for label-scarce domains.

Abstract

Hypergraphs are marked by complex topology, expressing higher-order interactions among multiple nodes with hyperedges, and better capturing the topology is essential for effective representation learning. Recent advances in generative self-supervised learning (SSL) suggest that hypergraph neural networks learned from generative self supervision have the potential to effectively encode the complex hypergraph topology. Designing a generative SSL strategy for hypergraphs, however, is not straightforward. Questions remain with regard to its generative SSL task, connection to downstream tasks, and empirical properties of learned representations. In light of the promises and challenges, we propose a novel generative SSL strategy for hypergraphs. We first formulate a generative SSL task on hypergraphs, hyperedge filling, and highlight its theoretical connection to node classification. Based on the generative SSL task, we propose a hypergraph SSL method, HypeBoy. HypeBoy learns effective general-purpose hypergraph representations, outperforming 16 baseline methods across 11 benchmark datasets.

HypeBoy: Generative Self-Supervised Representation Learning on Hypergraphs

TL;DR

Abstract

Paper Structure (44 sections, 3 theorems, 52 equations, 6 figures, 12 tables)

This paper contains 44 sections, 3 theorems, 52 equations, 6 figures, 12 tables.

Introduction
Related Work
Proposed Task and Theoretical Analysis
Proposed task: Hyperedge filling
Theoretical results on hyperedge filling
Basic setting
Hyperedge filling helps node classification
Proposed Method for Hyperedge Filling
Step 1: Hypergraph augmentation
Step 2: Hypergraph encoding
Step 3: Hyperedge filling loss
Two-stage training scheme for further enhancement
Experimental Results
Efficacy as a pre-training technique (fine-tuned evaluation)
Efficacy as a general-purpose embedding technique (linear evaluation)
...and 29 more sections

Key Result

Theorem 1

Assume a hyperedge $e_{j}$ s.t. $e_{j} \cap C_{1} \neq \emptyset$ and node features $\mathbf{X}$ that are generated under Assumption assump:featurelabel. For any node $v_{i} \in e_{j} \cap C_{1}$, the following holds:

Figures (6)

Figure 1: Overview of (a) the hyperedge filling task and (b) HypeBoy, our proposed SSL method based on the task. The goal of the task is to find the missing node for a given query subset (i.e., the other nodes in a hyperedge). HypeBoy trains HNNs aiming to correctly predict the missing node.
Figure 2: Analysis regarding Property 2 (prevention of dimensional collapse) and Property 3 (representation uniformity and alignment) of HypeBoy. As shown in (a), while HypeBoy without projection heads (red) suffers from dimensional collapse, HypeBoy (blue) does not, demonstrating the necessity of the projection head. Furthermore, as shown in (b), representations from an HNN trained by HypeBoy meet both uniformity and alignment, justifying our design choice of the loss function. Experiments are conducted on the Cora dataset.
Figure 3: Empirical demonstration of Theorem \ref{['thm:mainthm2']}. Note that $S$ denotes the size of a hyperedge, and $d$ denotes the dimension of features. As stated, $P_{\boldsymbol{x}, e}\left(\vec{\mathbf{1}}^{T}\left(\sum_{v_{k} \in q}\boldsymbol{x}_{k}\right) > 0\;\middle|\; \mathscr{P} \right)$ is strictly increasing in $\mathscr{P} \in [0.5, 1]$ (statement 2), and lower bounded by $0.5$ (statement 1).
Figure 4: Analyzing dimensional collapse of HypeBoy with/without projection heads on five benchmark datasets. While HypeBoy does not suffer from dimensional collapse, its variant that does not utilize projection heads, suffers from this issue.
Figure 5: Analyzing alignment and uniformity of representations obtained via HypeBoy. In most cases, representations obtained by HypeBoy achieve both alignment and uniformity.
...and 1 more figures

Theorems & Definitions (12)

Theorem 1: Improvement in effectiveness
proof
Theorem 2: Realization of condition
proof
proof
Remark
proof
Remark
Definition 1: Solved hyperedge filing task
Definition 2: Reasonable solution
...and 2 more

HypeBoy: Generative Self-Supervised Representation Learning on Hypergraphs

TL;DR

Abstract

HypeBoy: Generative Self-Supervised Representation Learning on Hypergraphs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (12)