Table of Contents
Fetching ...

Consistent Document-Level Relation Extraction via Counterfactuals

Ali Modarressi, Abdullatif Köksal, Hinrich Schütze

TL;DR

CovEReD, a counterfactual data generation approach for document-level relation extraction datasets using entity replacement, is presented and it is shown that by generating document-level counterfactual data with CovEReD and training models on them, consistency is maintained with minimal impact on RE performance.

Abstract

Many datasets have been developed to train and evaluate document-level relation extraction (RE) models. Most of these are constructed using real-world data. It has been shown that RE models trained on real-world data suffer from factual biases. To evaluate and address this issue, we present CovEReD, a counterfactual data generation approach for document-level relation extraction datasets using entity replacement. We first demonstrate that models trained on factual data exhibit inconsistent behavior: while they accurately extract triples from factual data, they fail to extract the same triples after counterfactual modification. This inconsistency suggests that models trained on factual data rely on spurious signals such as specific entities and external knowledge $\unicode{x2013}$ rather than on the input context $\unicode{x2013}$ to extract triples. We show that by generating document-level counterfactual data with CovEReD and training models on them, consistency is maintained with minimal impact on RE performance. We release our CovEReD pipeline as well as Re-DocRED-CF, a dataset of counterfactual RE documents, to assist in evaluating and addressing inconsistency in document-level RE.

Consistent Document-Level Relation Extraction via Counterfactuals

TL;DR

CovEReD, a counterfactual data generation approach for document-level relation extraction datasets using entity replacement, is presented and it is shown that by generating document-level counterfactual data with CovEReD and training models on them, consistency is maintained with minimal impact on RE performance.

Abstract

Many datasets have been developed to train and evaluate document-level relation extraction (RE) models. Most of these are constructed using real-world data. It has been shown that RE models trained on real-world data suffer from factual biases. To evaluate and address this issue, we present CovEReD, a counterfactual data generation approach for document-level relation extraction datasets using entity replacement. We first demonstrate that models trained on factual data exhibit inconsistent behavior: while they accurately extract triples from factual data, they fail to extract the same triples after counterfactual modification. This inconsistency suggests that models trained on factual data rely on spurious signals such as specific entities and external knowledge rather than on the input context to extract triples. We show that by generating document-level counterfactual data with CovEReD and training models on them, consistency is maintained with minimal impact on RE performance. We release our CovEReD pipeline as well as Re-DocRED-CF, a dataset of counterfactual RE documents, to assist in evaluating and addressing inconsistency in document-level RE.
Paper Structure (10 sections, 2 figures, 1 table, 2 algorithms)

This paper contains 10 sections, 2 figures, 1 table, 2 algorithms.

Figures (2)

  • Figure 1: Document from Re-DocRED tan-etal-2022-revisiting and counterfactual version generated with entity replacement. A model trained on factual data extracts the original triple, but fails on its counterfactual (CF) counterpart. Thus, the model is relying on spurious patterns such as entity biases. We address this by generating CF data and training RE models on them.
  • Figure 2: Three other examples of original documents and their counterfactual counterparts. In all three we observe a failure in predicting the counterfactual, while all information required for the relation to be extracted are present (Underlined).