Table of Contents
Fetching ...

Peeking inside the Black-Box: Reinforcement Learning for Explainable and Accurate Relation Extraction

Xinyu Guo, Zhengliang Shi, Minglai Yang, Mahdi Rahimi, Mihai Surdeanu

TL;DR

CogRE tackles the challenge of making relation extraction both accurate and explainable in one-shot settings by decomposing reasoning into semantic chunking, keyword anchoring, and integrative reasoning, guided by cognitive psychology. It introduces a lightweight Hit@Dict reward, derived from LLM-generated explanations, to jointly supervise task accuracy and explanation quality via reinforcement learning. Training with Group Relative Policy Optimization further stabilizes learning and accelerates convergence, yielding substantial automatic metric gains and improved human-rated explanations on Few-shot TACRED and NYT29. The approach demonstrates that grounding reasoning in relational keywords and structured cognitive steps can produce concise, label-aligned explanations while boosting RE performance. This framework offers a scalable path to transparent, adaptable RE systems for high-stakes applications.

Abstract

This paper introduces a framework for relation extraction (RE) that enhances both accuracy and explainability. The framework has two key components: (i) a reasoning mechanism that formulates relation extraction as a series of text-processing steps inspired by cognitive science, and (ii) an optimization process driven by reinforcement learning (RL) with a novel reward function designed to improve both task accuracy and explanation quality. We call our approach CogRE. Our framework addresses the lack of supervision for language-based explanations in traditional RE by promoting outputs that include important relation keywords. These keywords are drawn from a high-quality dictionary that is automatically constructed using an LLM. We evaluate our approach for the task of one-shot RE using two LLMs and two RE datasets. Our experiments show that CogRE improves explanation quality by addressing two common failure patterns in one-shot RE: poor attention focus and limited one-shot learning capability. For example, our cognitive-structured reasoning with Qwen2.5-15B-Instruct on One-shot NYT29 achieves 24.65% F1, surpassing prior reasoning-based designs. Optimizing this approach with RL using our reward further improves performance by +23.46% (absolute). Finally, human evaluation shows that our best model generates relational keywords closely aligned with gold labels, increasing human explanation quality ratings by 54% (relative).

Peeking inside the Black-Box: Reinforcement Learning for Explainable and Accurate Relation Extraction

TL;DR

CogRE tackles the challenge of making relation extraction both accurate and explainable in one-shot settings by decomposing reasoning into semantic chunking, keyword anchoring, and integrative reasoning, guided by cognitive psychology. It introduces a lightweight Hit@Dict reward, derived from LLM-generated explanations, to jointly supervise task accuracy and explanation quality via reinforcement learning. Training with Group Relative Policy Optimization further stabilizes learning and accelerates convergence, yielding substantial automatic metric gains and improved human-rated explanations on Few-shot TACRED and NYT29. The approach demonstrates that grounding reasoning in relational keywords and structured cognitive steps can produce concise, label-aligned explanations while boosting RE performance. This framework offers a scalable path to transparent, adaptable RE systems for high-stakes applications.

Abstract

This paper introduces a framework for relation extraction (RE) that enhances both accuracy and explainability. The framework has two key components: (i) a reasoning mechanism that formulates relation extraction as a series of text-processing steps inspired by cognitive science, and (ii) an optimization process driven by reinforcement learning (RL) with a novel reward function designed to improve both task accuracy and explanation quality. We call our approach CogRE. Our framework addresses the lack of supervision for language-based explanations in traditional RE by promoting outputs that include important relation keywords. These keywords are drawn from a high-quality dictionary that is automatically constructed using an LLM. We evaluate our approach for the task of one-shot RE using two LLMs and two RE datasets. Our experiments show that CogRE improves explanation quality by addressing two common failure patterns in one-shot RE: poor attention focus and limited one-shot learning capability. For example, our cognitive-structured reasoning with Qwen2.5-15B-Instruct on One-shot NYT29 achieves 24.65% F1, surpassing prior reasoning-based designs. Optimizing this approach with RL using our reward further improves performance by +23.46% (absolute). Finally, human evaluation shows that our best model generates relational keywords closely aligned with gold labels, increasing human explanation quality ratings by 54% (relative).

Paper Structure

This paper contains 34 sections, 8 equations, 2 figures, 13 tables, 2 algorithms.

Figures (2)

  • Figure 1: An overview of the CogRE framework. (a) Relational Keywords Dictionary: relational keywords are extracted from explanations of true positive samples generated by untrained LLMs to build a dictionary (Alg. \ref{['alg:build_keywords_dict']}). (b) Reinforcement Learning with Hit@Dict: LLM outputs scored by accuracy (answers) and Hit@Dict (explanations). (c) Example of Scoring with Hit@Dict: CogRE enables stepwise reasoning. Keywords in the dictionary are matched against the LLM output (Hit Times Table); the Hit@Dict reward counts a normalized hit rate (Section \ref{['sec:hit_reward']}).
  • Figure 2: Training dynamics on the one-shot NYT29 dataset for Phi-4 and Qwen2.5-14B-Instruct. The Y-axes show reward, KL penalty, and response length. We compare reinforcement learning with accuracy reward Only Acc and with the combined Hit@Dict reward Hit@Dict+Acc .