eRST: A Signaled Graph Theory of Discourse Relations and Organization
Amir Zeldes, Tatsuya Aoyama, Yang Janet Liu, Siyao Peng, Debopam Das, Luke Gessler
TL;DR
This work introduces eRST, a signaled graph theory that extends RST by allowing tree-breaking, non-projective, and concurrent discourse relations anchored to token-level signals. It defines a formal three-part framework (primary tree, secondary edges, and signals), builds a large, publicly available eRST-annotated GUM corpus across genres, and delivers a parsing baseline combining existing state-of-the-art components for primary trees, connectives, morphosyntax, and coreference with a novel transformer-based predictor for secondary edges. Empirical results show strong performance for primary-tree parsing but highlight the ongoing challenge of reliably predicting secondary edges and signal anchors, underscoring the importance of correct primary structures for downstream tasks. The paper argues that eRST enhances explainability, enables richer relation extraction, and opens avenues for detailed attribution analysis and downstream applications, while providing data and tools to support further development and cross-lingual extension.
Abstract
In this article we present Enhanced Rhetorical Structure Theory (eRST), a new theoretical framework for computational discourse analysis, based on an expansion of Rhetorical Structure Theory (RST). The framework encompasses discourse relation graphs with tree-breaking, non-projective and concurrent relations, as well as implicit and explicit signals which give explainable rationales to our analyses. We survey shortcomings of RST and other existing frameworks, such as Segmented Discourse Representation Theory (SDRT), the Penn Discourse Treebank (PDTB) and Discourse Dependencies, and address these using constructs in the proposed theory. We provide annotation, search and visualization tools for data, and present and evaluate a freely available corpus of English annotated according to our framework, encompassing 12 spoken and written genres with over 200K tokens. Finally, we discuss automatic parsing, evaluation metrics and applications for data in our framework.
