Rethinking Relation Extraction: Beyond Shortcuts to Generalization with a Debiased Benchmark

Liang He; Yougang Chu; Zhen Wu; Jianbing Zhang; Xinyu Dai; Jiajun Chen

Rethinking Relation Extraction: Beyond Shortcuts to Generalization with a Debiased Benchmark

Liang He, Yougang Chu, Zhen Wu, Jianbing Zhang, Xinyu Dai, Jiajun Chen

TL;DR

The paper tackles entity bias in relation extraction by introducing DREB, a debiased benchmark built via entity replacement and validated with Bias Evaluator and PPL Evaluator to ensure low bias and high naturalness. It then presents MixDebias, a dual-level debiasing approach combining data-level augmentation with a KL-divergence constraint and model-level causal bias mitigation to improve generalization while preserving performance on standard benchmarks. Extensive experiments demonstrate that DREB effectively reveals debiasing capabilities and that MixDebias achieves superior performance on DREB with robust results on original datasets, outpacing existing methods. The work highlights a practical path to more reliable relation extraction systems and provides public datasets and baselines to advance debiasing research.

Abstract

Benchmarks are crucial for evaluating machine learning algorithm performance, facilitating comparison and identifying superior solutions. However, biases within datasets can lead models to learn shortcut patterns, resulting in inaccurate assessments and hindering real-world applicability. This paper addresses the issue of entity bias in relation extraction tasks, where models tend to rely on entity mentions rather than context. We propose a debiased relation extraction benchmark DREB that breaks the pseudo-correlation between entity mentions and relation types through entity replacement. DREB utilizes Bias Evaluator and PPL Evaluator to ensure low bias and high naturalness, providing a reliable and accurate assessment of model generalization in entity bias scenarios. To establish a new baseline on DREB, we introduce MixDebias, a debiasing method combining data-level and model training-level techniques. MixDebias effectively improves model performance on DREB while maintaining performance on the original dataset. Extensive experiments demonstrate the effectiveness and robustness of MixDebias compared to existing methods, highlighting its potential for improving the generalization ability of relation extraction models. We will release DREB and MixDebias publicly.

Rethinking Relation Extraction: Beyond Shortcuts to Generalization with a Debiased Benchmark

TL;DR

Abstract

Paper Structure (18 sections, 4 equations, 7 figures, 2 tables)

This paper contains 18 sections, 4 equations, 7 figures, 2 tables.

Introduction
Related Work
DREB: A Debiased Relation Extraction Benchmark
Bias Evaluator.
PPL Evaluator.
Benchmark Analysis
Does DREB introduce distribution biases?
Does DREB introduce semantic biases?
MixDebias: A New Baseline on DREB
Data-level debiasing (RDA, Regularized Debias Approach):
Model-level debiasing (CDA, Casual Debias Approach):
Evaluation
Evaluation metric.
Baselines.
Main results.
...and 3 more sections

Figures (7)

Figure 1: An illustrative example of how entity biases can cause models to learn false shortcuts, inevitably resulting in erroneous predictions.
Figure 2: The construction workflow of DREB benchmark.
Figure 3: Comparison of relation type distributions.
Figure 4: Comparison of semantic distributions. The PPL Evaluator can effectively control semantic bias.
Figure 5: The overall workflow of MixDebias.
...and 2 more figures

Rethinking Relation Extraction: Beyond Shortcuts to Generalization with a Debiased Benchmark

TL;DR

Abstract

Rethinking Relation Extraction: Beyond Shortcuts to Generalization with a Debiased Benchmark

Authors

TL;DR

Abstract

Table of Contents

Figures (7)