Table of Contents
Fetching ...

A Systematic Comparison of Syntactic Representations of Dependency Parsing

Guillaume Wisniewski, Ophélie Lacroix

TL;DR

This paper addresses cross-language differences in dependency parsing performance induced by syntactic annotation schemes. It proposes seven transformation rules to convert UD representations into alternative structures and evaluates them across $38$ languages using $266$ transformed corpora (with $44$ identical cases). Parsers are trained in a transition-based arc-eager setup with a dynamic oracle and evaluated with Unlabeled Attachment Score (UAS). Results show that the UD scheme generally yields higher $UAS$ than transformed representations (average difference $0.66$, up to $8.1$), though some languages benefit from certain transformations; learnability metrics fail to reliably predict which representation will be best. The findings underscore the practical impact of annotation choices on cross-language parsing and highlight the limitations of current predictability criteria for representation selection.

Abstract

We compare the performance of a transition-based parser in regards to different annotation schemes. We pro-pose to convert some specific syntactic constructions observed in the universal dependency treebanks into a so-called more standard representation and to evaluate parsing performances over all the languages of the project. We show that the ``standard'' constructions do not lead systematically to better parsing performance and that the scores vary considerably according to the languages.

A Systematic Comparison of Syntactic Representations of Dependency Parsing

TL;DR

This paper addresses cross-language differences in dependency parsing performance induced by syntactic annotation schemes. It proposes seven transformation rules to convert UD representations into alternative structures and evaluates them across languages using transformed corpora (with identical cases). Parsers are trained in a transition-based arc-eager setup with a dynamic oracle and evaluated with Unlabeled Attachment Score (UAS). Results show that the UD scheme generally yields higher than transformed representations (average difference , up to ), though some languages benefit from certain transformations; learnability metrics fail to reliably predict which representation will be best. The findings underscore the practical impact of annotation choices on cross-language parsing and highlight the limitations of current predictability criteria for representation selection.

Abstract

We compare the performance of a transition-based parser in regards to different annotation schemes. We pro-pose to convert some specific syntactic constructions observed in the universal dependency treebanks into a so-called more standard representation and to evaluate parsing performances over all the languages of the project. We show that the ``standard'' constructions do not lead systematically to better parsing performance and that the scores vary considerably according to the languages.

Paper Structure

This paper contains 15 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Cases of non-projectivity caused by conversion, and correction. The main (bold) dependency $w_i$$\curvearrowright$$w_j$ is the one to invert. When inverting, $w_j$ becomes the root of the sub-structure.
  • Figure 2: Multi-word expression conversion for the danish phrase 'it is because of him'. The dependencies following the UD conventions are represented in blue above the words; the alternative structure is represented in green below the words.
  • Figure 3: Coordination conversion for the French phrase 'as well in red as in blue'. The dependencies following the UD conventions are represented in blue above the words; the alternative structure is represented in green below the words.
  • Figure 4: Distribution of differences between the UAS achieved on the UD and transformed corpora for the different languages and transformations considered. Positive differences indicates better results with UD annotations.