Table of Contents
Fetching ...

Auxiliary Tasks to Boost Biaffine Semantic Dependency Parsing

Marie Candito

TL;DR

The paper tackles the lack of inter-arc interdependence in a biaffine SDP parser by introducing auxiliary tasks that predict token-level properties such as the number of heads and incoming label sets. Through multi-task learning with uncertainty-weighted losses and stack propagation, these tasks influence arc and label scoring without sacrificing $O(n^2)$ complexity. Experiments on English SemEval2015 Task 18 and French deep syntactic graphs show modest but statistically significant improvements, with consistent gains across in-domain and out-of-domain settings. The method remains robust across languages and graph types, providing a simple, effective boost to SDP performance that can complement transformer-based representations.

Abstract

The biaffine parser of Dozat and Manning (2017) was successfully extended to semantic dependency parsing (SDP) (Dozat and Manning, 2018). Its performance on graphs is surprisingly high given that, without the constraint of producing a tree, all arcs for a given sentence are predicted independently from each other (modulo a shared representation of tokens). To circumvent such an independence of decision, while retaining the O(n^2) complexity and highly parallelizable architecture, we propose to use simple auxiliary tasks that introduce some form of interdependence between arcs. Experiments on the three English acyclic datasets of SemEval 2015 task 18 (Oepen et al., 2015), and on French deep syntactic cyclic graphs (Ribeyre et al., 2014) show modest but systematic performance gains on a near state-of-the-art baseline using transformer-based contextualized representations. This provides a simple and robust method to boost SDP performance.

Auxiliary Tasks to Boost Biaffine Semantic Dependency Parsing

TL;DR

The paper tackles the lack of inter-arc interdependence in a biaffine SDP parser by introducing auxiliary tasks that predict token-level properties such as the number of heads and incoming label sets. Through multi-task learning with uncertainty-weighted losses and stack propagation, these tasks influence arc and label scoring without sacrificing complexity. Experiments on English SemEval2015 Task 18 and French deep syntactic graphs show modest but statistically significant improvements, with consistent gains across in-domain and out-of-domain settings. The method remains robust across languages and graph types, providing a simple, effective boost to SDP performance that can complement transformer-based representations.

Abstract

The biaffine parser of Dozat and Manning (2017) was successfully extended to semantic dependency parsing (SDP) (Dozat and Manning, 2018). Its performance on graphs is surprisingly high given that, without the constraint of producing a tree, all arcs for a given sentence are predicted independently from each other (modulo a shared representation of tokens). To circumvent such an independence of decision, while retaining the O(n^2) complexity and highly parallelizable architecture, we propose to use simple auxiliary tasks that introduce some form of interdependence between arcs. Experiments on the three English acyclic datasets of SemEval 2015 task 18 (Oepen et al., 2015), and on French deep syntactic cyclic graphs (Ribeyre et al., 2014) show modest but systematic performance gains on a near state-of-the-art baseline using transformer-based contextualized representations. This provides a simple and robust method to boost SDP performance.
Paper Structure (19 sections, 4 equations, 2 figures, 5 tables)

This paper contains 19 sections, 4 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Top: English Semantic graph in the DM format, as part of the SemEval2015-Task18 dataset oepen-etal-2015-semeval. Bottom: French Deep syntactic graph as defined by candito-etal-2014-deep.
  • Figure 2: Example of competition for the sequence rule of thumb. Above arcs: correct MWE analysis (rule and of attached to the last MWE component thumb, and thumb being the head of the sequence). Below arcs: incorrect compositional analysis, in which rule is the head, e.g. attached wrongly as ARG1 of good (in red).