RUIE: Retrieval-based Unified Information Extraction using Large Language Model

Xincheng Liao; Junwen Duan; Yixi Huang; Jianxin Wang

RUIE: Retrieval-based Unified Information Extraction using Large Language Model

Xincheng Liao, Junwen Duan, Yixi Huang, Jianxin Wang

TL;DR

RUIE presents a retrieval-based UIE framework that enables efficient out-of-distribution generalization for NER, RE, and EE by coupling a trainable bi-encoder retriever with LLM-driven in-context learning. A novel demonstration selection mechanism combines LLM preferences with a keyword-enhanced reward to guide multi-task example retrieval, while a contrastive loss and knowledge distillation align the retriever with the reward model. Empirical results on 31 held-in and 8 held-out datasets show substantial improvements over instruction-tuning and other retrievers, with average F1-score gains of 19.22 and 3.22, respectively, across tasks. The approach reduces computation by avoiding full LLM fine-tuning and offers flexibility to work with various LLMs, making UIE more scalable and practical for real-world deployments. Limitations include sentence-length constraints, a gap to SFT methods on seen tasks, and English-only evaluation, suggesting directions for future multilingual and long-document UIE extensions.

Abstract

Unified information extraction (UIE) aims to extract diverse structured information from unstructured text. While large language models (LLMs) have shown promise for UIE, they require significant computational resources and often struggle to generalize to unseen tasks. We propose RUIE (Retrieval-based Unified Information Extraction), a framework that leverages in-context learning for efficient task generalization. RUIE introduces a novel demonstration selection mechanism combining LLM preferences with a keyword-enhanced reward model, and employs a bi-encoder retriever trained through contrastive learning and knowledge distillation. As the first trainable retrieval framework for UIE, RUIE serves as a universal plugin for various LLMs. Experimental results on eight held-out datasets demonstrate RUIE's effectiveness, with average F1-score improvements of 19.22 and 3.22 compared to instruction-tuning methods and other retrievers, respectively.

RUIE: Retrieval-based Unified Information Extraction using Large Language Model

TL;DR

Abstract

Paper Structure (27 sections, 4 equations, 4 figures, 13 tables)

This paper contains 27 sections, 4 equations, 4 figures, 13 tables.

Introduction
Related Works
Unified Information Extraction
In-context Learning based Information Extraction
Methods
Problem Statement
LLM Preference Scoring
Keyword-enhanced Reward
UIE Retriever Training
Experiment Setup
Datasets
Metrics
Baseline Methods
Implementation Details
Results and Analyses
...and 12 more sections

Figures (4)

Figure 1: Illustration of three different paradigms for solving unified information extraction task.
Figure 2: The overall architecture of RUIE. The training process consists of three steps: 1) the sparse retriever bm25 initializes a candidate set, which is then scored by the LLM. 2) a keyword-enhanced reward model captures fine-grained information. Keyword-enhanced strategy only applies to the input field of the example. 3) a bi-encoder dense retriever is trained using contrastive learning and knowledge distillation. During inference, the trained dense retriever selects the best demonstrations from the candidate pool $P$, and passes them to the LLM to produce the output.
Figure 3: Performance (in F1-score) comparison by varying k-shot demonstrations.
Figure 4: Inference speed (instances per second) comparison of different k-shots demonstrations on ACE 2004 (NER), ADE corpus (RE) and ACE2005 (ED and EAE) datasets.

RUIE: Retrieval-based Unified Information Extraction using Large Language Model

TL;DR

Abstract

RUIE: Retrieval-based Unified Information Extraction using Large Language Model

Authors

TL;DR

Abstract

Table of Contents

Figures (4)