Table of Contents
Fetching ...

Hypergraph Foundation Model

Yue Gao, Yifan Feng, Shiquan Liu, Xiangmin Han, Shaoyi Du, Zongze Wu, Han Hu

TL;DR

This work identifies the challenge of building foundation models for hypergraphs due to dual vertex-text features and complex structure. It introduces Hyper-FM, a two-module framework comprising Hierarchical High-Order Neighbor Guided Vertex Embedding and Hierarchical Multi-Hypergraph Guided Structural Knowledge Extraction, and demonstrates its effectiveness with 11 text-attributed hypergraph datasets. Through multi-domain pretraining and domain-specific fine-tuning, Hyper-FM achieves about a 13.4% average improvement over baselines and uncovers a scaling law showing domain diversity, not vertex/hyperedge scale, drives performance. The study also curates the first large suite of TAHG datasets and provides a comprehensive analysis of ablations and design choices, establishing a new direction for hypergraph-aware foundation models.

Abstract

Hypergraph neural networks (HGNNs) effectively model complex high-order relationships in domains like protein interactions and social networks by connecting multiple vertices through hyperedges, enhancing modeling capabilities, and reducing information loss. Developing foundation models for hypergraphs is challenging due to their distinct data, which includes both vertex features and intricate structural information. We present Hyper-FM, a Hypergraph Foundation Model for multi-domain knowledge extraction, featuring Hierarchical High-Order Neighbor Guided Vertex Knowledge Embedding for vertex feature representation and Hierarchical Multi-Hypergraph Guided Structural Knowledge Extraction for structural information. Additionally, we curate 11 text-attributed hypergraph datasets to advance research between HGNNs and LLMs. Experiments on these datasets show that Hyper-FM outperforms baseline methods by approximately 13.4%, validating our approach. Furthermore, we propose the first scaling law for hypergraph foundation models, demonstrating that increasing domain diversity significantly enhances performance, unlike merely augmenting vertex and hyperedge counts. This underscores the critical role of domain diversity in scaling hypergraph models.

Hypergraph Foundation Model

TL;DR

This work identifies the challenge of building foundation models for hypergraphs due to dual vertex-text features and complex structure. It introduces Hyper-FM, a two-module framework comprising Hierarchical High-Order Neighbor Guided Vertex Embedding and Hierarchical Multi-Hypergraph Guided Structural Knowledge Extraction, and demonstrates its effectiveness with 11 text-attributed hypergraph datasets. Through multi-domain pretraining and domain-specific fine-tuning, Hyper-FM achieves about a 13.4% average improvement over baselines and uncovers a scaling law showing domain diversity, not vertex/hyperedge scale, drives performance. The study also curates the first large suite of TAHG datasets and provides a comprehensive analysis of ablations and design choices, establishing a new direction for hypergraph-aware foundation models.

Abstract

Hypergraph neural networks (HGNNs) effectively model complex high-order relationships in domains like protein interactions and social networks by connecting multiple vertices through hyperedges, enhancing modeling capabilities, and reducing information loss. Developing foundation models for hypergraphs is challenging due to their distinct data, which includes both vertex features and intricate structural information. We present Hyper-FM, a Hypergraph Foundation Model for multi-domain knowledge extraction, featuring Hierarchical High-Order Neighbor Guided Vertex Knowledge Embedding for vertex feature representation and Hierarchical Multi-Hypergraph Guided Structural Knowledge Extraction for structural information. Additionally, we curate 11 text-attributed hypergraph datasets to advance research between HGNNs and LLMs. Experiments on these datasets show that Hyper-FM outperforms baseline methods by approximately 13.4%, validating our approach. Furthermore, we propose the first scaling law for hypergraph foundation models, demonstrating that increasing domain diversity significantly enhances performance, unlike merely augmenting vertex and hyperedge counts. This underscores the critical role of domain diversity in scaling hypergraph models.

Paper Structure

This paper contains 35 sections, 24 equations, 6 figures, 6 tables, 1 algorithm.

Figures (6)

  • Figure 1: Illustration of negative transfer phenomenon in hypergraph datasets.
  • Figure 2: Pipeline of the proposed Hypergraph Foundation Model (Hyper-FM).
  • Figure 3: Illustration of the Hierarchical High-Order Neighbor Guided Vertex Knowledge Embedding Module.
  • Figure 4: Illustration of sampling and building Hierarchical Multi-Hypergraphs for the pretaining
  • Figure 5: Experimental results of ablation on the vertex number for the sub-hypergraph sampling for each hypergraph domain dataset.
  • ...and 1 more figures