Hypergraph Foundation Model
Yue Gao, Yifan Feng, Shiquan Liu, Xiangmin Han, Shaoyi Du, Zongze Wu, Han Hu
TL;DR
This work identifies the challenge of building foundation models for hypergraphs due to dual vertex-text features and complex structure. It introduces Hyper-FM, a two-module framework comprising Hierarchical High-Order Neighbor Guided Vertex Embedding and Hierarchical Multi-Hypergraph Guided Structural Knowledge Extraction, and demonstrates its effectiveness with 11 text-attributed hypergraph datasets. Through multi-domain pretraining and domain-specific fine-tuning, Hyper-FM achieves about a 13.4% average improvement over baselines and uncovers a scaling law showing domain diversity, not vertex/hyperedge scale, drives performance. The study also curates the first large suite of TAHG datasets and provides a comprehensive analysis of ablations and design choices, establishing a new direction for hypergraph-aware foundation models.
Abstract
Hypergraph neural networks (HGNNs) effectively model complex high-order relationships in domains like protein interactions and social networks by connecting multiple vertices through hyperedges, enhancing modeling capabilities, and reducing information loss. Developing foundation models for hypergraphs is challenging due to their distinct data, which includes both vertex features and intricate structural information. We present Hyper-FM, a Hypergraph Foundation Model for multi-domain knowledge extraction, featuring Hierarchical High-Order Neighbor Guided Vertex Knowledge Embedding for vertex feature representation and Hierarchical Multi-Hypergraph Guided Structural Knowledge Extraction for structural information. Additionally, we curate 11 text-attributed hypergraph datasets to advance research between HGNNs and LLMs. Experiments on these datasets show that Hyper-FM outperforms baseline methods by approximately 13.4%, validating our approach. Furthermore, we propose the first scaling law for hypergraph foundation models, demonstrating that increasing domain diversity significantly enhances performance, unlike merely augmenting vertex and hyperedge counts. This underscores the critical role of domain diversity in scaling hypergraph models.
