Empirical Analysis for Unsupervised Universal Dependency Parse Tree Aggregation
Adithya Kulkarni, Oliver Eulenstein, Qi Li
TL;DR
The paper tackles unstable dependency parsing performance across domains and languages by developing unsupervised post-processing DTS aggregation. It reframes DTS as an edge-level binary labeling task and compares three frameworks—MST, CRH, and CIM—on 71 UD test treebanks across 49 languages. CIM, which models input parser correlations via majority voting and a learned probabilistic joint distribution, achieves the best performance, outperforming even strong LLM-based parsers and previous baselines. This provides a language- and domain-agnostic method to stabilize dependency parsing without labeled data, with potential extensions to relation labels in future work.
Abstract
Dependency parsing is an essential task in NLP, and the quality of dependency parsers is crucial for many downstream tasks. Parsers' quality often varies depending on the domain and the language involved. Therefore, it is essential to combat the issue of varying quality to achieve stable performance. In various NLP tasks, aggregation methods are used for post-processing aggregation and have been shown to combat the issue of varying quality. However, aggregation methods for post-processing aggregation have not been sufficiently studied in dependency parsing tasks. In an extensive empirical study, we compare different unsupervised post-processing aggregation methods to identify the most suitable dependency tree structure aggregation method.
