Table of Contents
Fetching ...

Scaling Graph-Based Dependency Parsing with Arc Vectorization and Attention-Based Refinement

Nicolas Floquet, Joseph Le Roux, Nadi Tomeh, Thierry Charnois

TL;DR

The paper tackles scalability in graph-based dependency parsing by replacing separate arc and label scoring with an arc-centric vector representation that unifies scoring within a single network. It introduces a Transformer-based refinement over arc vectors to emulate higher-order dependencies while a filtering mechanism keeps attention memory tractable. Experiments on PTB and UD demonstrate improved accuracy and competitive speed, with state-of-the-art results on PTB and strong gains across most UD languages. This arc-vector framework enables better parameter sharing and can extend to other structured prediction tasks.

Abstract

We propose a novel architecture for graph-based dependency parsing that explicitly constructs vectors, from which both arcs and labels are scored. Our method addresses key limitations of the standard two-pipeline approach by unifying arc scoring and labeling into a single network, reducing scalability issues caused by the information bottleneck and lack of parameter sharing. Additionally, our architecture overcomes limited arc interactions with transformer layers to efficiently simulate higher-order dependencies. Experiments on PTB and UD show that our model outperforms state-of-the-art parsers in both accuracy and efficiency.

Scaling Graph-Based Dependency Parsing with Arc Vectorization and Attention-Based Refinement

TL;DR

The paper tackles scalability in graph-based dependency parsing by replacing separate arc and label scoring with an arc-centric vector representation that unifies scoring within a single network. It introduces a Transformer-based refinement over arc vectors to emulate higher-order dependencies while a filtering mechanism keeps attention memory tractable. Experiments on PTB and UD demonstrate improved accuracy and competitive speed, with state-of-the-art results on PTB and strong gains across most UD languages. This arc-vector framework enables better parameter sharing and can extend to other structured prediction tasks.

Abstract

We propose a novel architecture for graph-based dependency parsing that explicitly constructs vectors, from which both arcs and labels are scored. Our method addresses key limitations of the standard two-pipeline approach by unifying arc scoring and labeling into a single network, reducing scalability issues caused by the information bottleneck and lack of parameter sharing. Additionally, our architecture overcomes limited arc interactions with transformer layers to efficiently simulate higher-order dependencies. Experiments on PTB and UD show that our model outperforms state-of-the-art parsers in both accuracy and efficiency.
Paper Structure (33 sections, 5 equations, 13 figures, 4 tables)

This paper contains 33 sections, 5 equations, 13 figures, 4 tables.

Figures (13)

  • Figure 1: Illustration of both models. Left: standard model with 2 (resp. 3) pipelines for Loc (resp. CRF2O) with shared word embeddings. Right: our proposal with a single pipeline and optionally $P$ transformers.
  • Figure 2: French error rates for words where one system has at least three times the error rate of another.
  • Figure 3: English error rates for words where one system has at least three times the error rate of another.
  • Figure 4: French error rates by attachment distance.
  • Figure 5: English error rates by attachment distance.
  • ...and 8 more figures