Table of Contents
Fetching ...

LINK-KG: LLM-Driven Coreference-Resolved Knowledge Graphs for Human Smuggling Networks

Dipak Meher, Carlotta Domeniconi, Guadalupe Correa-Cabrera

TL;DR

LINK-KG tackles the coreference and knowledge-graph construction challenge in long, narrative legal texts about human smuggling by introducing a memory-based, type-aware three-stage coreference pipeline guided by a persistent Prompt Cache. The approach combines NER with stage-wise cache construction and chunk-wise coreference resolution, followed by a GraphRAG-based KG extraction that enforces type-specific prompts and selective filtering to produce coherent graphs. Empirical results on 16 court cases show significant reductions in node duplication (average 45.21%) and noise (average 32.22%), demonstrating cleaner, more actionable graphs across short and long documents. The framework enables robust analysis of complex criminal networks and supports downstream tasks such as role attribution, temporal analysis, and event prediction in legal domains.

Abstract

Human smuggling networks are complex and constantly evolving, making them difficult to analyze comprehensively. Legal case documents offer rich factual and procedural insights into these networks but are often long, unstructured, and filled with ambiguous or shifting references, posing significant challenges for automated knowledge graph (KG) construction. Existing methods either overlook coreference resolution or fail to scale beyond short text spans, leading to fragmented graphs and inconsistent entity linking. We propose LINK-KG, a modular framework that integrates a three-stage, LLM-guided coreference resolution pipeline with downstream KG extraction. At the core of our approach is a type-specific Prompt Cache, which consistently tracks and resolves references across document chunks, enabling clean and disambiguated narratives for structured knowledge graph construction from both short and long legal texts. LINK-KG reduces average node duplication by 45.21% and noisy nodes by 32.22% compared to baseline methods, resulting in cleaner and more coherent graph structures. These improvements establish LINK-KG as a strong foundation for analyzing complex criminal networks.

LINK-KG: LLM-Driven Coreference-Resolved Knowledge Graphs for Human Smuggling Networks

TL;DR

LINK-KG tackles the coreference and knowledge-graph construction challenge in long, narrative legal texts about human smuggling by introducing a memory-based, type-aware three-stage coreference pipeline guided by a persistent Prompt Cache. The approach combines NER with stage-wise cache construction and chunk-wise coreference resolution, followed by a GraphRAG-based KG extraction that enforces type-specific prompts and selective filtering to produce coherent graphs. Empirical results on 16 court cases show significant reductions in node duplication (average 45.21%) and noise (average 32.22%), demonstrating cleaner, more actionable graphs across short and long documents. The framework enables robust analysis of complex criminal networks and supports downstream tasks such as role attribution, temporal analysis, and event prediction in legal domains.

Abstract

Human smuggling networks are complex and constantly evolving, making them difficult to analyze comprehensively. Legal case documents offer rich factual and procedural insights into these networks but are often long, unstructured, and filled with ambiguous or shifting references, posing significant challenges for automated knowledge graph (KG) construction. Existing methods either overlook coreference resolution or fail to scale beyond short text spans, leading to fragmented graphs and inconsistent entity linking. We propose LINK-KG, a modular framework that integrates a three-stage, LLM-guided coreference resolution pipeline with downstream KG extraction. At the core of our approach is a type-specific Prompt Cache, which consistently tracks and resolves references across document chunks, enabling clean and disambiguated narratives for structured knowledge graph construction from both short and long legal texts. LINK-KG reduces average node duplication by 45.21% and noisy nodes by 32.22% compared to baseline methods, resulting in cleaner and more coherent graph structures. These improvements establish LINK-KG as a strong foundation for analyzing complex criminal networks.

Paper Structure

This paper contains 29 sections, 4 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Overview of the LINK-KG framework. The pipeline operates in three main phases for each entity type. In Phase 1, the legal text is split into chunks and passed through a NER-LLM to extract type-specific entities and their descriptions. In Phase 2, a Mapping-LLM takes these outputs along with previously cached mappings to iteratively update a prompt cache for each chunk, ensuring consistency through a “gleaning” step across all chunks. In Phase 3, the Resolve-LLM uses the final prompt cache to perform coreference resolution and generate resolved legal chunks, which are then merged to produce the resolved output for that entity type. This three-phase process is repeated sequentially for each entity type (Type 1 to Type N) to obtain the final coreference-resolved legal text.
  • Figure 2: Overview of Knowledge Graph Construction Module. The resolved coreference legal document is split into chunks. Each chunk is paired with a prompt and sent to an LLM to extract entities and relationships. The extracted information is then combined to build a structured knowledge graph.