Tree-Averaging Algorithms for Ensemble-Based Unsupervised Discontinuous Constituency Parsing

Behzad Shayegh; Yuqiao Wen; Lili Mou

Tree-Averaging Algorithms for Ensemble-Based Unsupervised Discontinuous Constituency Parsing

Behzad Shayegh, Yuqiao Wen, Lili Mou

TL;DR

This work proposes to build an ensemble of different runs of the existing discontinuous parser by averaging the predicted trees, to stabilize and boost performance and develops an efficient exact algorithm to tackle the task.

Abstract

We address unsupervised discontinuous constituency parsing, where we observe a high variance in the performance of the only previous model in the literature. We propose to build an ensemble of different runs of the existing discontinuous parser by averaging the predicted trees, to stabilize and boost performance. To begin with, we provide comprehensive computational complexity analysis (in terms of P and NP-complete) for tree averaging under different setups of binarity and continuity. We then develop an efficient exact algorithm to tackle the task, which runs in a reasonable time for all samples in our experiments. Results on three datasets show our method outperforms all baselines in all metrics; we also provide in-depth analyses of our approach.

Tree-Averaging Algorithms for Ensemble-Based Unsupervised Discontinuous Constituency Parsing

TL;DR

Abstract

Paper Structure (22 sections, 12 theorems, 23 equations, 9 figures, 3 tables)

This paper contains 22 sections, 12 theorems, 23 equations, 9 figures, 3 tables.

Introduction
Approach
Unsupervised Discontinuous Constituency Parsing
Averaging over Constituency Trees
Our Search Algorithm
Candidate Constituents Pruning
Experiments
Settings
Results and Analyses
Related Work
Conclusion
Limitations
Proofs
Proof of Theorem \ref{['thm:boundeddiscontinuous']}
Proof of Theorem \ref{['thm:nonbinaryboundeddiscontinuous']}
...and 7 more sections

Key Result

Theorem 1

Problem problem:boundeddiscontinuous belongs to P.

Figures (9)

Figure 1: (a) A continuous parse structure in English. (b) An arguably discontinuous parse structure in English. (c) A discontinuous parse structure in German. Interesting structures (binarity and fan-out) are illustrated.
Figure 2: $F_1$ scores on continuous and discontinuous constituents in the NEGRA test set skut-etal-1997-annotation, where each point is a random run of TN-LCFRS yang-etal-2023-unsupervised.
Figure 3: Effect of the number of ensemble individuals on LASSY. (a) Averaged over $30$ trials with error bars indicating standard derivations. (b) Best-to-worst incremental ensemble.
Figure 4: Effectiveness of pruning on LASSY for (a) different sentence lengths, and (b) different numbers of ensemble individuals. Note that the dashed orange line does not fit the range of $y$-axis in (b).
Figure 5: Wall clock run time of tree-averaging algorithms on LASSY for different sentence lengths, using an Intel(R) Core(TM) i9-9940X (@3.30GHz) CPU.
...and 4 more figures

Theorems & Definitions (22)

Theorem 1
proof : Proof sketch
Theorem 2
proof : Proof sketch
Theorem 3
proof : Proof sketch
Lemma 1
Theorem 4
proof
Theorem 5: Lower bound
...and 12 more

Tree-Averaging Algorithms for Ensemble-Based Unsupervised Discontinuous Constituency Parsing

TL;DR

Abstract

Tree-Averaging Algorithms for Ensemble-Based Unsupervised Discontinuous Constituency Parsing

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (22)