Automatic Alignment of Discourse Relations of Different Discourse Annotation Frameworks

Yingxue Fu

Automatic Alignment of Discourse Relations of Different Discourse Annotation Frameworks

Yingxue Fu

TL;DR

This work addresses cross-framework discourse relation mapping between RST-DT and PDTB, where segmentation and inventory differences hinder interoperability. It introduces a fully automatic label-embedding framework based on label-anchored contrastive learning that jointly learns input representations and label embeddings using four losses $\mathcal{L}_{ICL}$, $\mathcal{L}_{LCL}$, $\mathcal{L}_{LEC}$, and $\mathcal{L}_{ICE}$, with inference by cosine similarity $\Phi$ between $\mathbf{E}_{X_i}$ and label embeddings. Experiments on RST-DT and PDTB 3.0 demonstrate meaningful cross-framework mappings, with data augmentation and label-encoder choices affecting performance; PDTB explicit relations are more reliably aligned than implicit ones. The approach yields competitive intrinsic and extrinsic alignment results, enabling interoperable discourse corpora and offering a path to extending alignment to other label inventories and frameworks.

Abstract

Existing discourse corpora are annotated based on different frameworks, which show significant dissimilarities in definitions of arguments and relations and structural constraints. Despite surface differences, these frameworks share basic understandings of discourse relations. The relationship between these frameworks has been an open research question, especially the correlation between relation inventories utilized in different frameworks. Better understanding of this question is helpful for integrating discourse theories and enabling interoperability of discourse corpora annotated under different frameworks. However, studies that explore correlations between discourse relation inventories are hindered by different criteria of discourse segmentation, and expert knowledge and manual examination are typically needed. Some semi-automatic methods have been proposed, but they rely on corpora annotated in multiple frameworks in parallel. In this paper, we introduce a fully automatic approach to address the challenges. Specifically, we extend the label-anchored contrastive learning method introduced by Zhang et al. (2022b) to learn label embeddings during a classification task. These embeddings are then utilized to map discourse relations from different frameworks. We show experimental results on RST-DT (Carlson et al., 2001) and PDTB 3.0 (Prasad et al., 2018).

Automatic Alignment of Discourse Relations of Different Discourse Annotation Frameworks

TL;DR

, and

, with inference by cosine similarity

between

and label embeddings. Experiments on RST-DT and PDTB 3.0 demonstrate meaningful cross-framework mappings, with data augmentation and label-encoder choices affecting performance; PDTB explicit relations are more reliably aligned than implicit ones. The approach yields competitive intrinsic and extrinsic alignment results, enabling interoperable discourse corpora and offering a path to extending alignment to other label inventories and frameworks.

Abstract

Paper Structure (17 sections, 10 equations, 3 figures, 7 tables)

This paper contains 17 sections, 10 equations, 3 figures, 7 tables.

Introduction
Related Work
Method
Experiments
Data Preprocessing
Hyperparameters and Training
Results
Data Augmentation for RST
Separate Experiments on PDTB Explicit and Implicit Relations
Ablation Study
RST-PDTB Relation Mapping
Mapping Results
Extrinsic Evaluation
Conclusions
Acknowledgments
...and 2 more sections

Figures (3)

Figure 1: RST-style annotation (wsj_0624 in RST-DT).
Figure 2: Illustration of the correlation matrix $\mathbf{\textit{M}}$. $\mathbf{E}_{1...k}$ represents the $k$ learnt label embeddings and $\mathbf{H}_{1...k}$ denotes the $k$ class representation proxies. After normalization, the average of the values at the diagonal (colored) is the overall measure of the quality of the learnt label embeddings.
Figure 3: (a) Label embeddings learnt with data augmentation. (b) Label embeddings learnt without data augmentation. For visualization, we choose the label embeddings with the highest score from the three runs.

Automatic Alignment of Discourse Relations of Different Discourse Annotation Frameworks

TL;DR

Abstract

Automatic Alignment of Discourse Relations of Different Discourse Annotation Frameworks

Authors

TL;DR

Abstract

Table of Contents

Figures (3)