Table of Contents
Fetching ...

Beyond Predefined Schemas: TRACE-KG for Context-Enriched Knowledge Graphs from Complex Documents

Mohammad Sadeq Abolhasani, Yang Ba, Yixuan He, Rong Pan

Abstract

Knowledge graph construction typically relies either on predefined ontologies or on schema-free extraction. Ontology-driven pipelines enforce consistent typing but require costly schema design and maintenance, whereas schema-free methods often produce fragmented graphs with weak global organization, especially in long technical documents with dense, context-dependent information. We propose TRACE-KG (Text-dRiven schemA for Context-Enriched Knowledge Graphs), a multimodal framework that jointly constructs a context-enriched knowledge graph and an induced schema without assuming a predefined ontology. TRACE-KG captures conditional relations through structured qualifiers and organizes entities and relations using a data-driven schema that serves as a reusable semantic scaffold while preserving full traceability to the source evidence. Experiments show that TRACE-KG produces structurally coherent, traceable knowledge graphs and offers a practical alternative to both ontology-driven and schema-free construction pipelines.

Beyond Predefined Schemas: TRACE-KG for Context-Enriched Knowledge Graphs from Complex Documents

Abstract

Knowledge graph construction typically relies either on predefined ontologies or on schema-free extraction. Ontology-driven pipelines enforce consistent typing but require costly schema design and maintenance, whereas schema-free methods often produce fragmented graphs with weak global organization, especially in long technical documents with dense, context-dependent information. We propose TRACE-KG (Text-dRiven schemA for Context-Enriched Knowledge Graphs), a multimodal framework that jointly constructs a context-enriched knowledge graph and an induced schema without assuming a predefined ontology. TRACE-KG captures conditional relations through structured qualifiers and organizes entities and relations using a data-driven schema that serves as a reusable semantic scaffold while preserving full traceability to the source evidence. Experiments show that TRACE-KG produces structurally coherent, traceable knowledge graphs and offers a practical alternative to both ontology-driven and schema-free construction pipelines.

Paper Structure

This paper contains 103 sections, 2 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: TRACE-KG pipeline. Multimodal documents are textualized to integrate non-text elements with narrative text (1) and segmented into chunks (2). Chunks are processed sequentially to extract entity mentions (3, EntRec) and consolidate them into resolved entities (4, EntRes). Resolved entities are grouped into entity classes (5, EntClsRec) and refined into a hierarchy of classes and class groups (6, EntClsRes). Given the resolved entities in each chunk, relation instances with qualifiers are extracted (7, RelRec) and canonicalized while inducing a relation schema (8, RelRes). Resolution stages (4, 6, 8) combine clustering with LLM-guided selection of available functions and deterministic execution. The outputs are a traceable context-enriched knowledge graph (9) and an induced hierarchical schema over entities and relations (10).
  • Figure 2: Failure modes behind raw retrieval accuracy on MINE-1. Left: Ret.Acc retained after discounting by connectivity and leakage. Middle: deviation of TriCR from the ideal value 1. Right: Ret.Acc versus EGU.
  • Figure 3: Normalized multi-metric profile on MINE-1. Leak is inverted, and compression is expressed as proximity to TriCR = 1 so that higher values are preferable on all axes.
  • Figure 4: Effective Graph Utilization (EGU) on MINE-1. Error bars indicate variability across benchmark instances.
  • Figure 5: Overview of the constructed context-enriched knowledge graph for the case-study document.
  • ...and 5 more figures