What Causes the Failure of Explicit to Implicit Discourse Relation Recognition?
Wei Liu, Stephen Wan, Michael Strube
TL;DR
The paper investigates why classifiers trained on explicit discourse relations (connectives removed) underperform on real implicit relations. It identifies label shift caused by connective removal as a major factor, and provides corpus-level evidence using PDTB 2.0/3.0 and GUM, along with a cosine-similarity label-shift metric. Two mitigation strategies are proposed: filtering noisy explicit examples and joint learning with connectives via a masked connective predictor trained with Gumbel-Softmax. Across experiments, these methods substantially reduce transfer gaps and improve implicit relation recognition, demonstrating generalization beyond PDTB to the GUM dataset. The work offers a practical, data-driven approach to robust explicit-to-implicit discourse relation classification with meaningful implications for discourse parsing systems.
Abstract
We consider an unanswered question in the discourse processing community: why do relation classifiers trained on explicit examples (with connectives removed) perform poorly in real implicit scenarios? Prior work claimed this is due to linguistic dissimilarity between explicit and implicit examples but provided no empirical evidence. In this study, we show that one cause for such failure is a label shift after connectives are eliminated. Specifically, we find that the discourse relations expressed by some explicit instances will change when connectives disappear. Unlike previous work manually analyzing a few examples, we present empirical evidence at the corpus level to prove the existence of such shift. Then, we analyze why label shift occurs by considering factors such as the syntactic role played by connectives, ambiguity of connectives, and more. Finally, we investigate two strategies to mitigate the label shift: filtering out noisy data and joint learning with connectives. Experiments on PDTB 2.0, PDTB 3.0, and the GUM dataset demonstrate that classifiers trained with our strategies outperform strong baselines.
