What Causes the Failure of Explicit to Implicit Discourse Relation Recognition?

Wei Liu; Stephen Wan; Michael Strube

What Causes the Failure of Explicit to Implicit Discourse Relation Recognition?

Wei Liu, Stephen Wan, Michael Strube

TL;DR

The paper investigates why classifiers trained on explicit discourse relations (connectives removed) underperform on real implicit relations. It identifies label shift caused by connective removal as a major factor, and provides corpus-level evidence using PDTB 2.0/3.0 and GUM, along with a cosine-similarity label-shift metric. Two mitigation strategies are proposed: filtering noisy explicit examples and joint learning with connectives via a masked connective predictor trained with Gumbel-Softmax. Across experiments, these methods substantially reduce transfer gaps and improve implicit relation recognition, demonstrating generalization beyond PDTB to the GUM dataset. The work offers a practical, data-driven approach to robust explicit-to-implicit discourse relation classification with meaningful implications for discourse parsing systems.

Abstract

We consider an unanswered question in the discourse processing community: why do relation classifiers trained on explicit examples (with connectives removed) perform poorly in real implicit scenarios? Prior work claimed this is due to linguistic dissimilarity between explicit and implicit examples but provided no empirical evidence. In this study, we show that one cause for such failure is a label shift after connectives are eliminated. Specifically, we find that the discourse relations expressed by some explicit instances will change when connectives disappear. Unlike previous work manually analyzing a few examples, we present empirical evidence at the corpus level to prove the existence of such shift. Then, we analyze why label shift occurs by considering factors such as the syntactic role played by connectives, ambiguity of connectives, and more. Finally, we investigate two strategies to mitigate the label shift: filtering out noisy data and joint learning with connectives. Experiments on PDTB 2.0, PDTB 3.0, and the GUM dataset demonstrate that classifiers trained with our strategies outperform strong baselines.

What Causes the Failure of Explicit to Implicit Discourse Relation Recognition?

TL;DR

Abstract

Paper Structure (30 sections, 7 equations, 10 figures, 8 tables, 3 algorithms)

This paper contains 30 sections, 7 equations, 10 figures, 8 tables, 3 algorithms.

Introduction
Related Work
Experimental Setup
Label Shift in Discourse Relations
What is label shift?
Do explicit examples suffer label shift?
Does this shift exist at the corpus level?
Can label shift be measured?
Why does label shift happen?
Strategies to alleviate the label shift
Filter out noisy examples
Joint learning with connectives
Experiments
Baselines and upper bounds
Overall results
...and 15 more sections

Figures (10)

Figure 1: Examples of suffering and not suffering the label shift.
Figure 2: Different cases suffering label shift.
Figure 3: Percentage of examples in Explicit and Imp-licit corpora that receive the same and different predictions when containing and not containing a connective.
Figure 4: Visualization of examples in PDTB 2.0 when containing or not containing a connective.
Figure 5: Feature Importance of the XGBoost Model in predicting the label shift metric on PDTB 2.0 and 3.0.
...and 5 more figures

What Causes the Failure of Explicit to Implicit Discourse Relation Recognition?

TL;DR

Abstract

What Causes the Failure of Explicit to Implicit Discourse Relation Recognition?

Authors

TL;DR

Abstract

Table of Contents

Figures (10)