On the Robustness of Document-Level Relation Extraction Models to Entity Name Variations

Shiao Meng; Xuming Hu; Aiwei Liu; Fukun Ma; Yawen Yang; Shuang Li; Lijie Wen

On the Robustness of Document-Level Relation Extraction Models to Entity Name Variations

Shiao Meng, Xuming Hu, Aiwei Liu, Fukun Ma, Yawen Yang, Shuang Li, Lijie Wen

TL;DR

This work investigates how robust document-level relation extraction (DocRE) models are to entity name variations and demonstrates substantial performance drops when entity names are renamed. It introduces a principled Wikidata-based pipeline to generate renamed documents and constructs Env-DocRED and Env-Re-DocRED to benchmark robustness, revealing that both standard DocRE models and in-context learning LLMs struggle with cross-sentence relations and entity-dense documents. To address this, the authors propose Entity Variation Robust Training (EVRT), a data-augmentation and consistency-regularization framework that improves robustness and preserves understanding and reasoning capabilities, with transferability to in-context learning via prompt optimization. The findings provide practical benchmarks and methods to build more robust DocRE systems for real-world scenarios with vast entity-name variation.

Abstract

Driven by the demand for cross-sentence and large-scale relation extraction, document-level relation extraction (DocRE) has attracted increasing research interest. Despite the continuous improvement in performance, we find that existing DocRE models which initially perform well may make more mistakes when merely changing the entity names in the document, hindering the generalization to novel entity names. To this end, we systematically investigate the robustness of DocRE models to entity name variations in this work. We first propose a principled pipeline to generate entity-renamed documents by replacing the original entity names with names from Wikidata. By applying the pipeline to DocRED and Re-DocRED datasets, we construct two novel benchmarks named Env-DocRED and Env-Re-DocRED for robustness evaluation. Experimental results show that both three representative DocRE models and two in-context learned large language models consistently lack sufficient robustness to entity name variations, particularly on cross-sentence relation instances and documents with more entities. Finally, we propose an entity variation robust training method which not only improves the robustness of DocRE models but also enhances their understanding and reasoning capabilities. We further verify that the basic idea of this method can be effectively transferred to in-context learning for DocRE as well.

On the Robustness of Document-Level Relation Extraction Models to Entity Name Variations

TL;DR

Abstract

Paper Structure (29 sections, 5 equations, 5 figures, 9 tables)

This paper contains 29 sections, 5 equations, 5 figures, 9 tables.

Introduction
Related Work
Document-Level Relation Extraction.
Robustness and Entity-Related Robustness in NLP.
Robustness of DocRE Models.
Problem Formulation
Benchmark Construction
Construction Pipeline
Step 1: Entity Linking Based on Wikidata.
Step 2: Fine-grained Entity Typing.
Step 3: Alias-count-matched Candidate Entity Retrieval.
Step 4: Alias-wise Entity Mention Name Substitution.
Env-DocRED and Env-Re-DocRED Benchmarks
Robustness Evaluation and Analysis
Selected Models and Evaluation Metrics
...and 14 more sections

Figures (5)

Figure 1: An illustration of how minor changes in entity names mislead the DocRE model to wrong predictions.
Figure 2: The proposed pipeline for generating documents with changed entity names.
Figure 3: Evaluation results on the test sets of four benchmarks. Since the test set of DocRED is unpublished, the Ign F1 results on Env-DocRED are not accurate and marked with "*", same applies to Table \ref{['tab:main_test_results']}.
Figure 4: F1 score of NCRL-BERT$_{\mathrm{base}}$ on documents with different number of entities.
Figure 5: MAP curves of NCRL-BERT$_{\mathrm{base}}$ and NCRL-BERT$_{\mathrm{base}}$ + EVRT.

On the Robustness of Document-Level Relation Extraction Models to Entity Name Variations

TL;DR

Abstract

On the Robustness of Document-Level Relation Extraction Models to Entity Name Variations

Authors

TL;DR

Abstract

Table of Contents

Figures (5)