Table of Contents
Fetching ...

How Fragile is Relation Extraction under Entity Replacements?

Yiwei Wang, Bryan Hooi, Fei Wang, Yujun Cai, Yuxuan Liang, Wenxuan Zhou, Jing Tang, Manjuan Duan, Muhao Chen

TL;DR

Problem: RE models often rely on entity names due to entity bias, limiting generalization. Approach: ENTRE for generating RE data with type-constrained random replacements and ENTRED as a challenging benchmark to audit RE models. Contributions: empirical evidence of 30-50% F1 drops under replacements, reduction of shortcuts with ENTRED, and the strongest gains from CoRE debiasing, plus release of code. Impact: motivates development of context-based reasoning in RE and provides scalable evaluation tools for robust RE in real-world settings.

Abstract

Relation extraction (RE) aims to extract the relations between entity names from the textual context. In principle, textual context determines the ground-truth relation and the RE models should be able to correctly identify the relations reflected by the textual context. However, existing work has found that the RE models memorize the entity name patterns to make RE predictions while ignoring the textual context. This motivates us to raise the question: ``are RE models robust to the entity replacements?'' In this work, we operate the random and type-constrained entity replacements over the RE instances in TACRED and evaluate the state-of-the-art RE models under the entity replacements. We observe the 30\% - 50\% F1 score drops on the state-of-the-art RE models under entity replacements. These results suggest that we need more efforts to develop effective RE models robust to entity replacements. We release the source code at https://github.com/wangywUST/RobustRE.

How Fragile is Relation Extraction under Entity Replacements?

TL;DR

Problem: RE models often rely on entity names due to entity bias, limiting generalization. Approach: ENTRE for generating RE data with type-constrained random replacements and ENTRED as a challenging benchmark to audit RE models. Contributions: empirical evidence of 30-50% F1 drops under replacements, reduction of shortcuts with ENTRED, and the strongest gains from CoRE debiasing, plus release of code. Impact: motivates development of context-based reasoning in RE and provides scalable evaluation tools for robust RE in real-world settings.

Abstract

Relation extraction (RE) aims to extract the relations between entity names from the textual context. In principle, textual context determines the ground-truth relation and the RE models should be able to correctly identify the relations reflected by the textual context. However, existing work has found that the RE models memorize the entity name patterns to make RE predictions while ignoring the textual context. This motivates us to raise the question: ``are RE models robust to the entity replacements?'' In this work, we operate the random and type-constrained entity replacements over the RE instances in TACRED and evaluate the state-of-the-art RE models under the entity replacements. We observe the 30\% - 50\% F1 score drops on the state-of-the-art RE models under entity replacements. These results suggest that we need more efforts to develop effective RE models robust to entity replacements. We release the source code at https://github.com/wangywUST/RobustRE.
Paper Structure (20 sections, 8 figures, 3 tables)

This paper contains 20 sections, 8 figures, 3 tables.

Figures (8)

  • Figure 1: The performance of state-of-the-art RE models drop a lot under entity replacements (ENTRE).
  • Figure 2: TACRED offers many shortcuts from entity names to ground-truth relations in the test set, where the model predicts the correct relation even when only given the entity names, despite all textual context being removed. As a result, TACRED is not challenging enough to measure the generalization under entity bias.
  • Figure 3: Two examples of incorrect entity annotations in TACRED.
  • Figure 4: The number of different subject entity names (red) is much lower than the number of instances (blue) in the test sets of the TACRED, TACREV, and Re-TACRED datasets. In other words, the diversity of entity names in these datasets' test sets is limited.
  • Figure 5: The original causal graph of RE models (left) together with its counterfactual alternatives for the entity bias (right). The shading indicates the mask of corresponding variables.
  • ...and 3 more figures