Table of Contents
Fetching ...

The expected sum of edge lengths in planar linearizations of trees. Theory and applications

Lluís Alemany-Puig, Ramon Ferrer-i-Cancho

TL;DR

This work analyzes planar linearizations of dependency trees as a relaxation of projectivity and derives the exact expected sum of edge lengths $D(T)$ under planarity, linking it to projective expectations via $\mathbb{E}_{\mathrm{pl}}[D(T)] = \frac{1}{n}\sum_{u\in V}\mathbb{E}_{\mathrm{pr}}^{\diamond}[D(T^{u})] = \frac{(n-1)(n-2)}{6n} + \frac{1}{n}\sum_{u\in V}\mathbb{E}_{\mathrm{pr}}[D(T^{u})]$. The theory develops a segment-based characterization of planar arrangements, provides counting formulas $N_{pl}(T)=n\prod_{u\in V} d(u)!$, and introduces $O(n)$-time uniform generators for planar and projective orders. An $O(n)$-time algorithm computes $\mathbb{E}_{\mathrm{pl}}[D(T)]$, enabling efficient baselines for dependency-distance minimization studies. Empirical analysis on Parallel Universal Dependencies shows that stronger constraints reduce the gap between actual dependency distances and random baselines, supporting the use of planar baselines to better gauge the strength of dependency-distance minimization across languages and annotational schemes.

Abstract

Dependency trees have proven to be a very successful model to represent the syntactic structure of sentences of human languages. In these structures, vertices are words and edges connect syntactically-dependent words. The tendency of these dependencies to be short has been demonstrated using random baselines for the sum of the lengths of the edges or its variants. A ubiquitous baseline is the expected sum in projective orderings (wherein edges do not cross and the root word of the sentence is not covered by any edge), that can be computed in time $O(n)$. Here we focus on a weaker formal constraint, namely planarity. In the theoretical domain, we present a characterization of planarity that, given a sentence, yields either the number of planar permutations or an efficient algorithm to generate uniformly random planar permutations of the words. We also show the relationship between the expected sum in planar arrangements and the expected sum in projective arrangements. In the domain of applications, we derive a $O(n)$-time algorithm to calculate the expected value of the sum of edge lengths. We also apply this research to a parallel corpus and find that the gap between actual dependency distance and the random baseline reduces as the strength of the formal constraint on dependency structures increases, suggesting that formal constraints absorb part of the dependency distance minimization effect. Our research paves the way for replicating past research on dependency distance minimization using random planar linearizations as random baseline.

The expected sum of edge lengths in planar linearizations of trees. Theory and applications

TL;DR

This work analyzes planar linearizations of dependency trees as a relaxation of projectivity and derives the exact expected sum of edge lengths under planarity, linking it to projective expectations via . The theory develops a segment-based characterization of planar arrangements, provides counting formulas , and introduces -time uniform generators for planar and projective orders. An -time algorithm computes , enabling efficient baselines for dependency-distance minimization studies. Empirical analysis on Parallel Universal Dependencies shows that stronger constraints reduce the gap between actual dependency distances and random baselines, supporting the use of planar baselines to better gauge the strength of dependency-distance minimization across languages and annotational schemes.

Abstract

Dependency trees have proven to be a very successful model to represent the syntactic structure of sentences of human languages. In these structures, vertices are words and edges connect syntactically-dependent words. The tendency of these dependencies to be short has been demonstrated using random baselines for the sum of the lengths of the edges or its variants. A ubiquitous baseline is the expected sum in projective orderings (wherein edges do not cross and the root word of the sentence is not covered by any edge), that can be computed in time . Here we focus on a weaker formal constraint, namely planarity. In the theoretical domain, we present a characterization of planarity that, given a sentence, yields either the number of planar permutations or an efficient algorithm to generate uniformly random planar permutations of the words. We also show the relationship between the expected sum in planar arrangements and the expected sum in projective arrangements. In the domain of applications, we derive a -time algorithm to calculate the expected value of the sum of edge lengths. We also apply this research to a parallel corpus and find that the gap between actual dependency distance and the random baseline reduces as the strength of the formal constraint on dependency structures increases, suggesting that formal constraints absorb part of the dependency distance minimization effect. Our research paves the way for replicating past research on dependency distance minimization using random planar linearizations as random baseline.
Paper Structure (20 sections, 6 theorems, 57 equations, 8 figures, 3 tables, 4 algorithms)

This paper contains 20 sections, 6 theorems, 57 equations, 8 figures, 3 tables, 4 algorithms.

Key Result

Theorem 1.1

Given a free tree $T=(V,E)$, where $\mathbb{E}_{\mathrm{pr}}^{\diamond}\left[ D_{}(T^{u}) \right]$ is the expected value of $D_{}(T^{u})$ in uniformly random projective arrangements $\pi$ of $T^{u}$ such that $\pi(u)=1$ and $\mathbb{E}_{\mathrm{pr}}^{}\left[ D_{}(T^{u}) \right]$ (Equation eq:introduction:E_pr_D) is th

Figures (8)

  • Figure 1: Examples of sentences with their syntactic dependency structures; arc labels indicate dependency distance (in words) between linked words. The rectangles denote the root word in each sentence. a) A projective dependency tree (adapted from Gross2009a). b) Planar (but not projective) syntactic dependency structure (adapted from Gross2009a). c) Non-projective and non-planar syntactic dependency structure (adapted from Nivre2009a).
  • Figure 2: Examples of sentences with their syntactic dependency structures; arc labels indicate dependency distance. The rectangles denote the root word in each sentence. Examples adapted from Morrill2000a. The sum of edge lengths are $D=18$ for a) and $D=12$ for b).
  • Figure 3: a) A free tree $T$, where $d(u)=4$, and $d(v)=5$; in this tree, $s_{u}(v)=5$ and $s_{v}(u)=4$. b) The free tree $T$ rooted at $u$, denoted as $T^{u}$, where $d_{u}(u) = d_{T^{u}}(v) = d(u)=4$, and where $4 = d_{u}(v) = d_{T^{u}}(v) < d(v)=5$. Figure borrowed from Hochberg2003aAlemany2022a.
  • Figure 4: Illustration of an edge's anchor $\alpha_{r u}(\pi)$ and coanchor $\beta_{r u}(\pi)$. In this figure, $u,v,w\in\Gamma(r)$. Figure adapted from Alemany2022b.
  • Figure 5: a) A rooted tree $T^{r}$ where $\Gamma(r)=\{r_1, \dots, r_p\}$ are the $p$ children of $r$. The subtree $T_{r_1}^{r}$ has been circled for clarity. b) An example of a permutation of the segments in $\Phi_r$ associated to the root. c) An example of a permutation of the segments in $\Phi_{r_1}$ associated to $r_1$, the segment at the leftmost position in the example in (b). The dash-dotted edge in (b) and in (c) represent the same edge of the tree. In (b) and (c), respectively, $r$ and $r_1$ are segments of length 1.
  • ...and 3 more figures

Theorems & Definitions (12)

  • Theorem 1.1
  • Proposition 1
  • proof
  • proof : Proof of Theorem \ref{['thm:introduction:E_pl_D']}
  • Lemma 2.1
  • proof
  • Proposition 2
  • proof
  • Lemma 3.1
  • proof
  • ...and 2 more