Table of Contents
Fetching ...

Hallucination-Resistant Relation Extraction via Dependency-Aware Sentence Simplification and Two-tiered Hierarchical Refinement

Yupei Yang, Fan Feng, Lin Yang, Wanxi Deng, Lin Qu, Biwei Huang, Shikui Tu, Lei Xu

TL;DR

This paper tackles hallucination in large-language-model–based relation extraction by introducing DEPTH, a two-tiered framework that combines dependency-aware sentence simplification for per-pair grounding with a global Refinement stage to ensure sentence-wide consistency. A causal reward modeling approach is proposed to mitigate reward hacking in RLHF, enabling robust PPO-based fine-tuning. Empirical results across eight diverse benchmarks show DEPTH consistently reduces NO-RELATION hallucinations and yields substantial improvements in micro-F1, with strong cross-dataset transferability. Overall, DEPTH offers a practical, scalable solution for reliable, domain-general relation extraction in enterprise contexts.

Abstract

Relation extraction (RE) enables the construction of structured knowledge for many downstream applications. While large language models (LLMs) have shown great promise in this task, they often struggle to reliably determine whether a relation exists, particularly in sentences with complex syntax or subtle semantics. For instance, we find that Qwen2.5-14B-Instruct incorrectly predicts a relation in 96.9% of NO-RELATION instances on SciERC, revealing a severe hallucination problem. To address these challenges, we propose DEPTH, a framework that integrates Dependency-aware sEntence simPlification and Two-tiered Hierarchical refinement into the relation extraction pipeline. Given a sentence and its candidate entity pairs, DEPTH operates in two stages: (1) the Grounding module extracts relations for each pair by leveraging their shortest dependency path, distilling the sentence into a minimal yet coherent relational context that reduces syntactic noise while preserving key semantics; (2) the Refinement module aggregates all local predictions and revises them based on a holistic understanding of the sentence, correcting omissions and inconsistencies. We further introduce a causality-driven reward model that mitigates reward hacking by disentangling spurious correlations, enabling robust fine-tuning via reinforcement learning with human feedback. Experiments on eight well-established benchmarks demonstrate that DEPTH reduces the average hallucination rate to 7.9% while achieving a 9.3% improvement in average F1 score over existing LLM-based extraction baselines.

Hallucination-Resistant Relation Extraction via Dependency-Aware Sentence Simplification and Two-tiered Hierarchical Refinement

TL;DR

This paper tackles hallucination in large-language-model–based relation extraction by introducing DEPTH, a two-tiered framework that combines dependency-aware sentence simplification for per-pair grounding with a global Refinement stage to ensure sentence-wide consistency. A causal reward modeling approach is proposed to mitigate reward hacking in RLHF, enabling robust PPO-based fine-tuning. Empirical results across eight diverse benchmarks show DEPTH consistently reduces NO-RELATION hallucinations and yields substantial improvements in micro-F1, with strong cross-dataset transferability. Overall, DEPTH offers a practical, scalable solution for reliable, domain-general relation extraction in enterprise contexts.

Abstract

Relation extraction (RE) enables the construction of structured knowledge for many downstream applications. While large language models (LLMs) have shown great promise in this task, they often struggle to reliably determine whether a relation exists, particularly in sentences with complex syntax or subtle semantics. For instance, we find that Qwen2.5-14B-Instruct incorrectly predicts a relation in 96.9% of NO-RELATION instances on SciERC, revealing a severe hallucination problem. To address these challenges, we propose DEPTH, a framework that integrates Dependency-aware sEntence simPlification and Two-tiered Hierarchical refinement into the relation extraction pipeline. Given a sentence and its candidate entity pairs, DEPTH operates in two stages: (1) the Grounding module extracts relations for each pair by leveraging their shortest dependency path, distilling the sentence into a minimal yet coherent relational context that reduces syntactic noise while preserving key semantics; (2) the Refinement module aggregates all local predictions and revises them based on a holistic understanding of the sentence, correcting omissions and inconsistencies. We further introduce a causality-driven reward model that mitigates reward hacking by disentangling spurious correlations, enabling robust fine-tuning via reinforcement learning with human feedback. Experiments on eight well-established benchmarks demonstrate that DEPTH reduces the average hallucination rate to 7.9% while achieving a 9.3% improvement in average F1 score over existing LLM-based extraction baselines.

Paper Structure

This paper contains 57 sections, 4 equations, 5 figures, 12 tables.

Figures (5)

  • Figure 1: Overall framework of DEPTH: the Grounding module uses SDP-based simplification and an RLHF-fine-tuned LLM to predict per-pair relations (only three are illustrated for brevity), while the Refinement module jointly refines all predictions using global context. Dependency parsing provides structural guidance throughout, and RLHF is applied only during training.
  • Figure 2: Illustration of the Dependency-aware Simplification module, which leverages the SDP to produce a concise relational context for extraction. The corresponding prompt template is provided in Appendix \ref{['sec:ape_template']}.
  • Figure 3: An example of hallucination in LLM-based extraction caused by co-occurrence.
  • Figure 4: Causal diagrams of the standard RM training process and our causal method.
  • Figure 5: (Q3) Pairwise accuracy during RM training. DEPTH$-$CRM refers to the ablation variant where causal reward modeling is removed, details are provided in Section \ref{['sec:ablation']}.

Theorems & Definitions (1)

  • Definition 1: Hallucination Rate (HR)