Table of Contents
Fetching ...

OntoKG: Ontology-Oriented Knowledge Graph Construction with Intrinsic-Relational Routing

Yitao Li, Zhanlin Liu, Anuranjan Pandey, Muni Srikanth

Abstract

Organizing a large-scale knowledge graph into a typed property graph requires structural decisions -- which entities become nodes, which properties become edges, and what schema governs these choices. Existing approaches embed these decisions in pipeline code or extract relations ad hoc, producing schemas that are tightly coupled to their construction process and difficult to reuse for downstream ontology-level tasks. We present an ontology-oriented approach in which the schema is designed from the outset for ontology analysis, entity disambiguation, domain customization, and LLM-guided extraction -- not merely as a byproduct of graph building. The core mechanism is intrinsic-relational routing, which classifies every property as either intrinsic or relational and routes it to the corresponding schema module. This routing produces a declarative schema that is portable across storage backends and independently reusable. We instantiate the approach on the January 2026 Wikidata dump. A rule-based cleaning stage identifies a 34.6M-entity core set from the full dump, followed by iterative intrinsic-relational routing that assigns each property to one of 94 modules organized into 8 categories. With tool-augmented LLM support and human review, the schema reaches 93.3% category coverage and 98.0% module assignment among classified entities. Exporting this schema yields a property graph with 34.0M nodes and 61.2M edges across 38 relationship types. We validate the ontology-oriented claim through five applications that consume the schema independently of the construction pipeline: ontology structure analysis, benchmark annotation auditing, entity disambiguation, domain customization, and LLM-guided extraction.

OntoKG: Ontology-Oriented Knowledge Graph Construction with Intrinsic-Relational Routing

Abstract

Organizing a large-scale knowledge graph into a typed property graph requires structural decisions -- which entities become nodes, which properties become edges, and what schema governs these choices. Existing approaches embed these decisions in pipeline code or extract relations ad hoc, producing schemas that are tightly coupled to their construction process and difficult to reuse for downstream ontology-level tasks. We present an ontology-oriented approach in which the schema is designed from the outset for ontology analysis, entity disambiguation, domain customization, and LLM-guided extraction -- not merely as a byproduct of graph building. The core mechanism is intrinsic-relational routing, which classifies every property as either intrinsic or relational and routes it to the corresponding schema module. This routing produces a declarative schema that is portable across storage backends and independently reusable. We instantiate the approach on the January 2026 Wikidata dump. A rule-based cleaning stage identifies a 34.6M-entity core set from the full dump, followed by iterative intrinsic-relational routing that assigns each property to one of 94 modules organized into 8 categories. With tool-augmented LLM support and human review, the schema reaches 93.3% category coverage and 98.0% module assignment among classified entities. Exporting this schema yields a property graph with 34.0M nodes and 61.2M edges across 38 relationship types. We validate the ontology-oriented claim through five applications that consume the schema independently of the construction pipeline: ontology structure analysis, benchmark annotation auditing, entity disambiguation, domain customization, and LLM-guided extraction.

Paper Structure

This paper contains 34 sections, 3 equations, 3 figures, 5 tables, 1 algorithm.

Figures (3)

  • Figure 1: Overview of the schema-centered knowledge graph ecosystem. A raw knowledge graph (left) is transformed through an intrinsic-relational classification framework into a typed property graph (right). The declarative schema (center) serves as the core artifact, enabling downstream applications including entity disambiguation, domain customization, and LLM-guided extraction.
  • Figure 2: Bipartite view of 8 categories (top) connected to 18 cross-category relational modules with span $\geq 3$ (bottom). Badges indicate category span. Two additional modules (affiliation and technology, each spanning 2 categories) are omitted for clarity. The full schema comprises 56 intrinsic modules, 18 single-category relational modules, and 20 cross-category relational modules.
  • Figure 3: Governance domain subgraph extracted by selecting three relational modules: government (green), legal (red), and politics (purple). The subgraph spans four categories---People, Knowledge, Organizations, and Events---with edge labels showing the underlying Wikidata properties (e.g., position held, political party, member of).