Adaptive Reinforcement Learning Planning: Harnessing Large Language Models for Complex Information Extraction
Zepeng Ding, Ruiyang Ke, Wenhao Huang, Guochao Jiang, Yanda Li, Deqing Yang, Jiaqing Liang
TL;DR
The paper addresses instability in LLM-based information extraction on complex inputs, where extraction order and planning significantly affect outputs. It proposes a two-stage framework that first classifies relation/event types and then sequentially extracts arguments, with planning implemented as a Markov decision process and solved via a DDQN-trained decision model that guides an LLM-based extractor. The environment consists of the LLM extractor as the agent, a binary semantic/token-level reward, and a reward-driven training loop, enabling stable multi-step extraction without task-specific fine-tuning. Experiments across multiple Chinese and English IE datasets demonstrate improved precision, recall, and F1 compared with fixed or baseline prompting methods, particularly in complicated extraction settings. This approach offers a general, model-collaborative method for robust IE with large language models and RL-based planning.
Abstract
Existing research on large language models (LLMs) shows that they can solve information extraction tasks through multi-step planning. However, their extraction behavior on complex sentences and tasks is unstable, emerging issues such as false positives and missing elements. We observe that decomposing complex extraction tasks and extracting them step by step can effectively improve LLMs' performance, and the extraction orders of entities significantly affect the final results of LLMs. This paper proposes a two-stage multi-step method for LLM-based information extraction and adopts the RL framework to execute the multi-step planning. We regard sequential extraction as a Markov decision process, build an LLM-based extraction environment, design a decision module to adaptively provide the optimal order for sequential entity extraction on different sentences, and utilize the DDQN algorithm to train the decision model. We also design the rewards and evaluation metrics suitable for the extraction results of LLMs. We conduct extensive experiments on multiple public datasets to demonstrate the effectiveness of our method in improving the information extraction capabilities of LLMs.
