Hyper-KGGen: A Skill-Driven Knowledge Extractor for High-Quality Knowledge Hypergraph Generation

Rizhuo Huang; Yifan Feng; Rundong Xue; Shihui Ying; Jun-Hai Yong; Chuan Shi; Shaoyi Du; Yue Gao

Hyper-KGGen: A Skill-Driven Knowledge Extractor for High-Quality Knowledge Hypergraph Generation

Rizhuo Huang, Yifan Feng, Rundong Xue, Shihui Ying, Jun-Hai Yong, Chuan Shi, Shaoyi Du, Yue Gao

TL;DR

The proposed Hyper-KGGen is a skill-driven framework that reformulates extraction as a dynamic skill-evolving process that significantly outperforms strong baselines, validating that evolved skills provide substantially richer guidance than static few-shot examples in multi-scenario settings.

Abstract

Knowledge hypergraphs surpass traditional binary knowledge graphs by encapsulating complex $n$-ary atomic facts, providing a more comprehensive paradigm for semantic representation. However, constructing high-quality hypergraphs remains challenging due to the \textit{scenario gap}: generic extractors struggle to generalize across diverse domains with specific jargon, while existing methods often fail to balance structural skeletons with fine-grained details. To bridge this gap, we propose \textbf{Hyper-KGGen}, a skill-driven framework that reformulates extraction as a dynamic skill-evolving process. First, Hyper-KGGen employs a \textit{coarse-to-fine} mechanism to systematically decompose documents, ensuring full-dimensional coverage from binary links to complex hyperedges. Crucially, it incorporates an \textit{adaptive skill acquisition} module that actively distills domain expertise into a Global Skill Library. This is achieved via a stability-based feedback loop, where extraction stability serves as a relative reward signal to induce high-quality skills from unstable traces and missed predictions. Additionally, we present \textbf{HyperDocRED}, a rigorously annotated benchmark for document-level knowledge hypergraph extraction. Experiments demonstrate that Hyper-KGGen significantly outperforms strong baselines, validating that evolved skills provide substantially richer guidance than static few-shot examples in multi-scenario settings.

Hyper-KGGen: A Skill-Driven Knowledge Extractor for High-Quality Knowledge Hypergraph Generation

TL;DR

Abstract

Knowledge hypergraphs surpass traditional binary knowledge graphs by encapsulating complex

-ary atomic facts, providing a more comprehensive paradigm for semantic representation. However, constructing high-quality hypergraphs remains challenging due to the \textit{scenario gap}: generic extractors struggle to generalize across diverse domains with specific jargon, while existing methods often fail to balance structural skeletons with fine-grained details. To bridge this gap, we propose \textbf{Hyper-KGGen}, a skill-driven framework that reformulates extraction as a dynamic skill-evolving process. First, Hyper-KGGen employs a \textit{coarse-to-fine} mechanism to systematically decompose documents, ensuring full-dimensional coverage from binary links to complex hyperedges. Crucially, it incorporates an \textit{adaptive skill acquisition} module that actively distills domain expertise into a Global Skill Library. This is achieved via a stability-based feedback loop, where extraction stability serves as a relative reward signal to induce high-quality skills from unstable traces and missed predictions. Additionally, we present \textbf{HyperDocRED}, a rigorously annotated benchmark for document-level knowledge hypergraph extraction. Experiments demonstrate that Hyper-KGGen significantly outperforms strong baselines, validating that evolved skills provide substantially richer guidance than static few-shot examples in multi-scenario settings.

Paper Structure (39 sections, 8 equations, 5 figures, 5 tables)

This paper contains 39 sections, 8 equations, 5 figures, 5 tables.

Introduction
Related work
Knowledge Extraction
Hypergraph Knowledge Extraction
Skill Distillation and Reuse
Preliminary and Definition
Methodology
Overview
Coarse-to-Fine Hypergraph Knowledge Extraction
Document Chunking
Entity Extraction
Coarse-to-Fine Hyperedge Extraction
Knowledge Deduplication
Adaptive Skill Acquisition for Diverse Scenarios
Parallel Rollout for Candidate Generation
...and 24 more sections

Figures (5)

Figure 1: Illustration of the Scenario Gap of general prompting and domain-specific prompting. Using generic prompts in KGGen leads to suboptimal knowledge extraction. By contrast, applying domain-specific "adaptive prompts" substantially improves the model's ability to extract facts.
Figure 2: The overall architecture of our proposed Hyper-KGGen framework for high quality knowledge hypergraph generation. It consists two modules: (a) is the Coarse-to-Fine Knowledge Hypergraph Extraction module, and (b) is the Adaptive Diverse Scenarios Skill Acquisition module for iteratively generate reusable skills to Skill Library from execution history.
Figure 3: Precision-Recall Curves for $n$-ary Relation Extraction on the HyperDocRED Dataset.
Figure 4: Distribution of MINE scores across 100 articles for KGGen, Cog-RAG, and Hyper-KGGen. Dotted vertical lines show average performance.
Figure 5: Performance Scaling with Few-Shot Setting and Skill Size on the HyperDocRED Dataset.

Hyper-KGGen: A Skill-Driven Knowledge Extractor for High-Quality Knowledge Hypergraph Generation

TL;DR

Abstract

Hyper-KGGen: A Skill-Driven Knowledge Extractor for High-Quality Knowledge Hypergraph Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (5)