Table of Contents
Fetching ...

Hyper-KGGen: A Skill-Driven Knowledge Extractor for High-Quality Knowledge Hypergraph Generation

Rizhuo Huang, Yifan Feng, Rundong Xue, Shihui Ying, Jun-Hai Yong, Chuan Shi, Shaoyi Du, Yue Gao

TL;DR

The proposed Hyper-KGGen is a skill-driven framework that reformulates extraction as a dynamic skill-evolving process that significantly outperforms strong baselines, validating that evolved skills provide substantially richer guidance than static few-shot examples in multi-scenario settings.

Abstract

Knowledge hypergraphs surpass traditional binary knowledge graphs by encapsulating complex $n$-ary atomic facts, providing a more comprehensive paradigm for semantic representation. However, constructing high-quality hypergraphs remains challenging due to the \textit{scenario gap}: generic extractors struggle to generalize across diverse domains with specific jargon, while existing methods often fail to balance structural skeletons with fine-grained details. To bridge this gap, we propose \textbf{Hyper-KGGen}, a skill-driven framework that reformulates extraction as a dynamic skill-evolving process. First, Hyper-KGGen employs a \textit{coarse-to-fine} mechanism to systematically decompose documents, ensuring full-dimensional coverage from binary links to complex hyperedges. Crucially, it incorporates an \textit{adaptive skill acquisition} module that actively distills domain expertise into a Global Skill Library. This is achieved via a stability-based feedback loop, where extraction stability serves as a relative reward signal to induce high-quality skills from unstable traces and missed predictions. Additionally, we present \textbf{HyperDocRED}, a rigorously annotated benchmark for document-level knowledge hypergraph extraction. Experiments demonstrate that Hyper-KGGen significantly outperforms strong baselines, validating that evolved skills provide substantially richer guidance than static few-shot examples in multi-scenario settings.

Hyper-KGGen: A Skill-Driven Knowledge Extractor for High-Quality Knowledge Hypergraph Generation

TL;DR

The proposed Hyper-KGGen is a skill-driven framework that reformulates extraction as a dynamic skill-evolving process that significantly outperforms strong baselines, validating that evolved skills provide substantially richer guidance than static few-shot examples in multi-scenario settings.

Abstract

Knowledge hypergraphs surpass traditional binary knowledge graphs by encapsulating complex -ary atomic facts, providing a more comprehensive paradigm for semantic representation. However, constructing high-quality hypergraphs remains challenging due to the \textit{scenario gap}: generic extractors struggle to generalize across diverse domains with specific jargon, while existing methods often fail to balance structural skeletons with fine-grained details. To bridge this gap, we propose \textbf{Hyper-KGGen}, a skill-driven framework that reformulates extraction as a dynamic skill-evolving process. First, Hyper-KGGen employs a \textit{coarse-to-fine} mechanism to systematically decompose documents, ensuring full-dimensional coverage from binary links to complex hyperedges. Crucially, it incorporates an \textit{adaptive skill acquisition} module that actively distills domain expertise into a Global Skill Library. This is achieved via a stability-based feedback loop, where extraction stability serves as a relative reward signal to induce high-quality skills from unstable traces and missed predictions. Additionally, we present \textbf{HyperDocRED}, a rigorously annotated benchmark for document-level knowledge hypergraph extraction. Experiments demonstrate that Hyper-KGGen significantly outperforms strong baselines, validating that evolved skills provide substantially richer guidance than static few-shot examples in multi-scenario settings.
Paper Structure (39 sections, 8 equations, 5 figures, 5 tables)

This paper contains 39 sections, 8 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Illustration of the Scenario Gap of general prompting and domain-specific prompting. Using generic prompts in KGGen leads to suboptimal knowledge extraction. By contrast, applying domain-specific "adaptive prompts" substantially improves the model's ability to extract facts.
  • Figure 2: The overall architecture of our proposed Hyper-KGGen framework for high quality knowledge hypergraph generation. It consists two modules: (a) is the Coarse-to-Fine Knowledge Hypergraph Extraction module, and (b) is the Adaptive Diverse Scenarios Skill Acquisition module for iteratively generate reusable skills to Skill Library from execution history.
  • Figure 3: Precision-Recall Curves for $n$-ary Relation Extraction on the HyperDocRED Dataset.
  • Figure 4: Distribution of MINE scores across 100 articles for KGGen, Cog-RAG, and Hyper-KGGen. Dotted vertical lines show average performance.
  • Figure 5: Performance Scaling with Few-Shot Setting and Skill Size on the HyperDocRED Dataset.