A Novel Dependency Framework for Enhancing Discourse Data Analysis

Kun Sun; Rong Wang

A Novel Dependency Framework for Enhancing Discourse Data Analysis

Kun Sun, Rong Wang

TL;DR

The paper tackles fragmentation across discourse corpora annotated under different theories by proposing a universal dependency representation. It converts PDTB annotations into local dependencies and validates this conversion across English, Chinese, and other languages using refined BERT-based discourse parsers. It demonstrates a strong correlation between RST and PDTB dependencies and shows cross-linguistic applicability, suggesting generalizability beyond English. The framework enables unified quantitative analysis of discourse, supports larger multilingual datasets for training, and offers a potential prompt structure for state-of-the-art language models.

Abstract

The development of different theories of discourse structure has led to the establishment of discourse corpora based on these theories. However, the existence of discourse corpora established on different theoretical bases creates challenges when it comes to exploring them in a consistent and cohesive way. This study has as its primary focus the conversion of PDTB annotations into dependency structures. It employs refined BERT-based discourse parsers to test the validity of the dependency data derived from the PDTB-style corpora in English, Chinese, and several other languages. By converting both PDTB and RST annotations for the same texts into dependencies, this study also applies ``dependency distance'' metrics to examine the correlation between RST dependencies and PDTB dependencies in English. The results show that the PDTB dependency data is valid and that there is a strong correlation between the two types of dependency distance. This study presents a comprehensive approach for analyzing and evaluating discourse corpora by employing discourse dependencies to achieve unified analysis. By applying dependency representations, we can extract data from PDTB, RST, and SDRT corpora in a coherent and unified manner. Moreover, the cross-linguistic validation establishes the framework's generalizability beyond English. The establishment of this comprehensive dependency framework overcomes limitations of existing discourse corpora, supporting a diverse range of algorithms and facilitating further studies in computational discourse analysis and language sciences.

A Novel Dependency Framework for Enhancing Discourse Data Analysis

TL;DR

Abstract

Paper Structure (17 sections, 2 equations, 4 figures, 5 tables)

This paper contains 17 sections, 2 equations, 4 figures, 5 tables.

Introduction
Related Work
Methods
Dependency parsing
PDTB converted into dependency representations
Example
Discourse distance & the variation of dependency distance
Experiments
Parser evaluations on PDTB dependency representations across languages
The correlation between mean/SD discourse distance of RST and PDTB
Discussion
Conclusion
Appendix
PDTB Example and annotations
The PDTB annotation system
...and 2 more sections

Figures (4)

Figure 1: The RST-tree of an example
Figure 2: Panel A represents the original PDTB annotations (WSJ_0618), that is, a discourse connective is a head governing two discourse units. Panel B represents the converted data structure from Panel A.
Figure 3: Panel A: The RST tree of Fig.\ref{['fig:1']} converts to a dependency tree. An EDU in RST is similar to a word in syntactic dependency analysis. The node where the arrow point is located is a head (or governor). Here, "e" denotes "EDU". Panel B: The text in Fig.\ref{['fig:1']} can be annotated by the styles of PDTB. Given the EDUs in this text remain the same in PDTB annotations, the connectives are heads that form dependencies. Panel C: Such relations in B can be treated as dependencies beteeen EDUs, as well as co-occurring network data, as shown in the bottom panel of Table \ref{['table:1']}.
Figure 4: PDTB-3 Sense Hierarchy. The leftmost column contains the Level-1 senses and the middle column, the Level-2 senses.

A Novel Dependency Framework for Enhancing Discourse Data Analysis

TL;DR

Abstract

A Novel Dependency Framework for Enhancing Discourse Data Analysis

Authors

TL;DR

Abstract

Table of Contents

Figures (4)