Table of Contents
Fetching ...

Adaptive Reinforcement Learning Planning: Harnessing Large Language Models for Complex Information Extraction

Zepeng Ding, Ruiyang Ke, Wenhao Huang, Guochao Jiang, Yanda Li, Deqing Yang, Jiaqing Liang

TL;DR

The paper addresses instability in LLM-based information extraction on complex inputs, where extraction order and planning significantly affect outputs. It proposes a two-stage framework that first classifies relation/event types and then sequentially extracts arguments, with planning implemented as a Markov decision process and solved via a DDQN-trained decision model that guides an LLM-based extractor. The environment consists of the LLM extractor as the agent, a binary semantic/token-level reward, and a reward-driven training loop, enabling stable multi-step extraction without task-specific fine-tuning. Experiments across multiple Chinese and English IE datasets demonstrate improved precision, recall, and F1 compared with fixed or baseline prompting methods, particularly in complicated extraction settings. This approach offers a general, model-collaborative method for robust IE with large language models and RL-based planning.

Abstract

Existing research on large language models (LLMs) shows that they can solve information extraction tasks through multi-step planning. However, their extraction behavior on complex sentences and tasks is unstable, emerging issues such as false positives and missing elements. We observe that decomposing complex extraction tasks and extracting them step by step can effectively improve LLMs' performance, and the extraction orders of entities significantly affect the final results of LLMs. This paper proposes a two-stage multi-step method for LLM-based information extraction and adopts the RL framework to execute the multi-step planning. We regard sequential extraction as a Markov decision process, build an LLM-based extraction environment, design a decision module to adaptively provide the optimal order for sequential entity extraction on different sentences, and utilize the DDQN algorithm to train the decision model. We also design the rewards and evaluation metrics suitable for the extraction results of LLMs. We conduct extensive experiments on multiple public datasets to demonstrate the effectiveness of our method in improving the information extraction capabilities of LLMs.

Adaptive Reinforcement Learning Planning: Harnessing Large Language Models for Complex Information Extraction

TL;DR

The paper addresses instability in LLM-based information extraction on complex inputs, where extraction order and planning significantly affect outputs. It proposes a two-stage framework that first classifies relation/event types and then sequentially extracts arguments, with planning implemented as a Markov decision process and solved via a DDQN-trained decision model that guides an LLM-based extractor. The environment consists of the LLM extractor as the agent, a binary semantic/token-level reward, and a reward-driven training loop, enabling stable multi-step extraction without task-specific fine-tuning. Experiments across multiple Chinese and English IE datasets demonstrate improved precision, recall, and F1 compared with fixed or baseline prompting methods, particularly in complicated extraction settings. This approach offers a general, model-collaborative method for robust IE with large language models and RL-based planning.

Abstract

Existing research on large language models (LLMs) shows that they can solve information extraction tasks through multi-step planning. However, their extraction behavior on complex sentences and tasks is unstable, emerging issues such as false positives and missing elements. We observe that decomposing complex extraction tasks and extracting them step by step can effectively improve LLMs' performance, and the extraction orders of entities significantly affect the final results of LLMs. This paper proposes a two-stage multi-step method for LLM-based information extraction and adopts the RL framework to execute the multi-step planning. We regard sequential extraction as a Markov decision process, build an LLM-based extraction environment, design a decision module to adaptively provide the optimal order for sequential entity extraction on different sentences, and utilize the DDQN algorithm to train the decision model. We also design the rewards and evaluation metrics suitable for the extraction results of LLMs. We conduct extensive experiments on multiple public datasets to demonstrate the effectiveness of our method in improving the information extraction capabilities of LLMs.
Paper Structure (22 sections, 5 equations, 2 figures, 4 tables, 1 algorithm)

This paper contains 22 sections, 5 equations, 2 figures, 4 tables, 1 algorithm.

Figures (2)

  • Figure 1: (a) Extraction orders influence the outputs. (b) Planning and extracting are entangled for each model.
  • Figure 2: The main workflow of our method. On the left is the extracting process, including relation/event classification and arguments extraction. On the right is the planning part, which guides each step based on the BERT-based Q-Network.