Table of Contents
Fetching ...

AutoRE: Document-Level Relation Extraction with Large Language Models

Lilong Xue, Dan Zhang, Yuxiao Dong, Jie Tang

TL;DR

AutoRE introduces the Relation-Head-Facts (RHF) paradigm for document-level relation extraction and implements it as three PEFT-based QLoRA modules on a Mistral-7B backbone. By decomposing extraction into relation, head, and fact steps with instruction-tuning templates, AutoRE achieves state-of-the-art results on Re-DocRED and demonstrates strong cross-model generalization. The approach addresses DocRE-specific challenges in handling numerous relations and multiple triplets across documents, while remaining computationally efficient. Limitations include reliance on a fixed relation vocabulary and in-domain training data; future work will broaden relation coverage and unseen-relations handling, with code and demo publicly available.

Abstract

Large Language Models (LLMs) have demonstrated exceptional abilities in comprehending and generating text, motivating numerous researchers to utilize them for Information Extraction (IE) purposes, including Relation Extraction (RE). Nonetheless, most existing methods are predominantly designed for Sentence-level Relation Extraction (SentRE) tasks, which typically encompass a restricted set of relations and triplet facts within a single sentence. Furthermore, certain approaches resort to treating relations as candidate choices integrated into prompt templates, leading to inefficient processing and suboptimal performance when tackling Document-Level Relation Extraction (DocRE) tasks, which entail handling multiple relations and triplet facts distributed across a given document, posing distinct challenges. To overcome these limitations, we introduce AutoRE, an end-to-end DocRE model that adopts a novel RE extraction paradigm named RHF (Relation-Head-Facts). Unlike existing approaches, AutoRE does not rely on the assumption of known relation options, making it more reflective of real-world scenarios. Additionally, we have developed an easily extensible RE framework using a Parameters Efficient Fine Tuning (PEFT) algorithm (QLoRA). Our experiments on the RE-DocRED dataset showcase AutoRE's best performance, achieving state-of-the-art results, surpassing TAG by 10.03\% and 9.03\% respectively on the dev and test set. The code is available at https://github.com/THUDM/AutoRE and the demonstration video is provided at https://www.youtube.com/watch?v=IhKRsZUAxKk.

AutoRE: Document-Level Relation Extraction with Large Language Models

TL;DR

AutoRE introduces the Relation-Head-Facts (RHF) paradigm for document-level relation extraction and implements it as three PEFT-based QLoRA modules on a Mistral-7B backbone. By decomposing extraction into relation, head, and fact steps with instruction-tuning templates, AutoRE achieves state-of-the-art results on Re-DocRED and demonstrates strong cross-model generalization. The approach addresses DocRE-specific challenges in handling numerous relations and multiple triplets across documents, while remaining computationally efficient. Limitations include reliance on a fixed relation vocabulary and in-domain training data; future work will broaden relation coverage and unseen-relations handling, with code and demo publicly available.

Abstract

Large Language Models (LLMs) have demonstrated exceptional abilities in comprehending and generating text, motivating numerous researchers to utilize them for Information Extraction (IE) purposes, including Relation Extraction (RE). Nonetheless, most existing methods are predominantly designed for Sentence-level Relation Extraction (SentRE) tasks, which typically encompass a restricted set of relations and triplet facts within a single sentence. Furthermore, certain approaches resort to treating relations as candidate choices integrated into prompt templates, leading to inefficient processing and suboptimal performance when tackling Document-Level Relation Extraction (DocRE) tasks, which entail handling multiple relations and triplet facts distributed across a given document, posing distinct challenges. To overcome these limitations, we introduce AutoRE, an end-to-end DocRE model that adopts a novel RE extraction paradigm named RHF (Relation-Head-Facts). Unlike existing approaches, AutoRE does not rely on the assumption of known relation options, making it more reflective of real-world scenarios. Additionally, we have developed an easily extensible RE framework using a Parameters Efficient Fine Tuning (PEFT) algorithm (QLoRA). Our experiments on the RE-DocRED dataset showcase AutoRE's best performance, achieving state-of-the-art results, surpassing TAG by 10.03\% and 9.03\% respectively on the dev and test set. The code is available at https://github.com/THUDM/AutoRE and the demonstration video is provided at https://www.youtube.com/watch?v=IhKRsZUAxKk.
Paper Structure (11 sections, 4 figures, 7 tables)

This paper contains 11 sections, 4 figures, 7 tables.

Figures (4)

  • Figure 1: The result on the test set of Re-DocRED. AutoRE (-A) achieves SOTA for different LLMs.
  • Figure 2: Processing steps of different RE paradigms.
  • Figure 3: The homepage of online AutoRE.
  • Figure 4: Performance of different paradigms and AutoRE (-A) for different PLMs.