Automatic Alignment of Discourse Relations of Different Discourse Annotation Frameworks
Yingxue Fu
TL;DR
This work addresses cross-framework discourse relation mapping between RST-DT and PDTB, where segmentation and inventory differences hinder interoperability. It introduces a fully automatic label-embedding framework based on label-anchored contrastive learning that jointly learns input representations and label embeddings using four losses $\mathcal{L}_{ICL}$, $\mathcal{L}_{LCL}$, $\mathcal{L}_{LEC}$, and $\mathcal{L}_{ICE}$, with inference by cosine similarity $\Phi$ between $\mathbf{E}_{X_i}$ and label embeddings. Experiments on RST-DT and PDTB 3.0 demonstrate meaningful cross-framework mappings, with data augmentation and label-encoder choices affecting performance; PDTB explicit relations are more reliably aligned than implicit ones. The approach yields competitive intrinsic and extrinsic alignment results, enabling interoperable discourse corpora and offering a path to extending alignment to other label inventories and frameworks.
Abstract
Existing discourse corpora are annotated based on different frameworks, which show significant dissimilarities in definitions of arguments and relations and structural constraints. Despite surface differences, these frameworks share basic understandings of discourse relations. The relationship between these frameworks has been an open research question, especially the correlation between relation inventories utilized in different frameworks. Better understanding of this question is helpful for integrating discourse theories and enabling interoperability of discourse corpora annotated under different frameworks. However, studies that explore correlations between discourse relation inventories are hindered by different criteria of discourse segmentation, and expert knowledge and manual examination are typically needed. Some semi-automatic methods have been proposed, but they rely on corpora annotated in multiple frameworks in parallel. In this paper, we introduce a fully automatic approach to address the challenges. Specifically, we extend the label-anchored contrastive learning method introduced by Zhang et al. (2022b) to learn label embeddings during a classification task. These embeddings are then utilized to map discourse relations from different frameworks. We show experimental results on RST-DT (Carlson et al., 2001) and PDTB 3.0 (Prasad et al., 2018).
