Table of Contents
Fetching ...

Hypergraph Self-supervised Learning with Sampling-efficient Signals

Fan Li, Xiaoyang Wang, Dawei Cheng, Wenjie Zhang, Ying Zhang, Xuemin Lin

TL;DR

This paper tackles the inefficiencies and bias of existing hypergraph self-supervised learning by introducing SE-HSSL, a sampling-efficient framework that uses three signals: node-level and group-level CCA objectives, which are sampling-free, and a hierarchical membership-level contrast to exploit overlap structure. The approach relies on a shared HGNN encoder and two augmented views generated via node feature masking and membership masking, optimizing a joint objective that blends invariance and decorrelation terms while avoiding大量 negative sampling. Empirical results across seven real-world hypergraphs show SE-HSSL achieves state-of-the-art or competitive performance in node classification and clustering, with substantial training speedups (at least 2x, often more) compared to the current SOTA TriCL. The work advances high-order hypergraph representation learning by reducing sampling bias and computational cost, enabling scalable SSL for complex hypergraph data with strong downstream utility.

Abstract

Self-supervised learning (SSL) provides a promising alternative for representation learning on hypergraphs without costly labels. However, existing hypergraph SSL models are mostly based on contrastive methods with the instance-level discrimination strategy, suffering from two significant limitations: (1) They select negative samples arbitrarily, which is unreliable in deciding similar and dissimilar pairs, causing training bias. (2) They often require a large number of negative samples, resulting in expensive computational costs. To address the above issues, we propose SE-HSSL, a hypergraph SSL framework with three sampling-efficient self-supervised signals. Specifically, we introduce two sampling-free objectives leveraging the canonical correlation analysis as the node-level and group-level self-supervised signals. Additionally, we develop a novel hierarchical membership-level contrast objective motivated by the cascading overlap relationship in hypergraphs, which can further reduce membership sampling bias and improve the efficiency of sample utilization. Through comprehensive experiments on 7 real-world hypergraphs, we demonstrate the superiority of our approach over the state-of-the-art method in terms of both effectiveness and efficiency.

Hypergraph Self-supervised Learning with Sampling-efficient Signals

TL;DR

This paper tackles the inefficiencies and bias of existing hypergraph self-supervised learning by introducing SE-HSSL, a sampling-efficient framework that uses three signals: node-level and group-level CCA objectives, which are sampling-free, and a hierarchical membership-level contrast to exploit overlap structure. The approach relies on a shared HGNN encoder and two augmented views generated via node feature masking and membership masking, optimizing a joint objective that blends invariance and decorrelation terms while avoiding大量 negative sampling. Empirical results across seven real-world hypergraphs show SE-HSSL achieves state-of-the-art or competitive performance in node classification and clustering, with substantial training speedups (at least 2x, often more) compared to the current SOTA TriCL. The work advances high-order hypergraph representation learning by reducing sampling bias and computational cost, enabling scalable SSL for complex hypergraph data with strong downstream utility.

Abstract

Self-supervised learning (SSL) provides a promising alternative for representation learning on hypergraphs without costly labels. However, existing hypergraph SSL models are mostly based on contrastive methods with the instance-level discrimination strategy, suffering from two significant limitations: (1) They select negative samples arbitrarily, which is unreliable in deciding similar and dissimilar pairs, causing training bias. (2) They often require a large number of negative samples, resulting in expensive computational costs. To address the above issues, we propose SE-HSSL, a hypergraph SSL framework with three sampling-efficient self-supervised signals. Specifically, we introduce two sampling-free objectives leveraging the canonical correlation analysis as the node-level and group-level self-supervised signals. Additionally, we develop a novel hierarchical membership-level contrast objective motivated by the cascading overlap relationship in hypergraphs, which can further reduce membership sampling bias and improve the efficiency of sample utilization. Through comprehensive experiments on 7 real-world hypergraphs, we demonstrate the superiority of our approach over the state-of-the-art method in terms of both effectiveness and efficiency.
Paper Structure (32 sections, 17 equations, 4 figures, 4 tables)

This paper contains 32 sections, 17 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: The architecture of the hypergraph self-supervised learning framework SE-HSSL.
  • Figure 2: Hierachical membership relation
  • Figure 3: Parameter sensitivity test.
  • Figure 4: The training time comparison between SE-HSSL and TriCL. log($\cdot$) represents the natural logarithm.

Theorems & Definitions (2)

  • Definition 1
  • Definition 2