Table of Contents
Fetching ...

Inside CORE-KG: Evaluating Structured Prompting and Coreference Resolution for Knowledge Graphs

Dipak Meher, Carlotta Domeniconi

TL;DR

This paper tackles the challenge of constructing accurate knowledge graphs from unstructured legal texts about human smuggling by introducing CORE-KG, a modular pipeline that combines type-aware coreference resolution with domain-guided structured prompting. An ablation study isolates the contributions of each component, showing that removing coreference resolution increases node duplication by 28.25% and noisy nodes by 4.32%, while removing structured prompts increases duplication by 4.29% and noisy nodes by 73.33%. The results demonstrate that both modules are essential and complementary, with CORE-KG producing denser, more coherent graphs than GraphRAG baselines. The work provides practical guidance for designing robust LLM-based KG pipelines in complex legal domains and points to future improvements in prompt design and cross-document coreference for unified graphs.

Abstract

Human smuggling networks are increasingly adaptive and difficult to analyze. Legal case documents offer critical insights but are often unstructured, lexically dense, and filled with ambiguous or shifting references, which pose significant challenges for automated knowledge graph (KG) construction. While recent LLM-based approaches improve over static templates, they still generate noisy, fragmented graphs with duplicate nodes due to the absence of guided extraction and coreference resolution. The recently proposed CORE-KG framework addresses these limitations by integrating a type-aware coreference module and domain-guided structured prompts, significantly reducing node duplication and legal noise. In this work, we present a systematic ablation study of CORE-KG to quantify the individual contributions of its two key components. Our results show that removing coreference resolution results in a 28.25% increase in node duplication and a 4.32% increase in noisy nodes, while removing structured prompts leads to a 4.29% increase in node duplication and a 73.33% increase in noisy nodes. These findings offer empirical insights for designing robust LLM-based pipelines for extracting structured representations from complex legal texts.

Inside CORE-KG: Evaluating Structured Prompting and Coreference Resolution for Knowledge Graphs

TL;DR

This paper tackles the challenge of constructing accurate knowledge graphs from unstructured legal texts about human smuggling by introducing CORE-KG, a modular pipeline that combines type-aware coreference resolution with domain-guided structured prompting. An ablation study isolates the contributions of each component, showing that removing coreference resolution increases node duplication by 28.25% and noisy nodes by 4.32%, while removing structured prompts increases duplication by 4.29% and noisy nodes by 73.33%. The results demonstrate that both modules are essential and complementary, with CORE-KG producing denser, more coherent graphs than GraphRAG baselines. The work provides practical guidance for designing robust LLM-based KG pipelines in complex legal domains and points to future improvements in prompt design and cross-document coreference for unified graphs.

Abstract

Human smuggling networks are increasingly adaptive and difficult to analyze. Legal case documents offer critical insights but are often unstructured, lexically dense, and filled with ambiguous or shifting references, which pose significant challenges for automated knowledge graph (KG) construction. While recent LLM-based approaches improve over static templates, they still generate noisy, fragmented graphs with duplicate nodes due to the absence of guided extraction and coreference resolution. The recently proposed CORE-KG framework addresses these limitations by integrating a type-aware coreference module and domain-guided structured prompts, significantly reducing node duplication and legal noise. In this work, we present a systematic ablation study of CORE-KG to quantify the individual contributions of its two key components. Our results show that removing coreference resolution results in a 28.25% increase in node duplication and a 4.32% increase in noisy nodes, while removing structured prompts leads to a 4.29% increase in node duplication and a 73.33% increase in noisy nodes. These findings offer empirical insights for designing robust LLM-based pipelines for extracting structured representations from complex legal texts.

Paper Structure

This paper contains 30 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Overview of the CORE-KG Pipeline. The pipeline begins with legal text inputs, processed via a type-aware coreference resolution module using sequential per-type prompting. The final resolved text is then passed through a structured prompting stage for entity and relationship extraction. The resulting triples are used to construct a coherent knowledge graph with significantly reduced duplication and noise.
  • Figure 2: Graphs of (a) CORE-KG; (b) CoreKG-no-coref; (c) CoreKG-no-struct-prompt, shown for Case 06. Duplicate entities are indicated with solid rectangles, noisy or irrelevant entities with solid ovals, and disconnected or weakly connected nodes with dashed ovals.