Scaling Graph-Based Dependency Parsing with Arc Vectorization and Attention-Based Refinement

Nicolas Floquet; Joseph Le Roux; Nadi Tomeh; Thierry Charnois

Scaling Graph-Based Dependency Parsing with Arc Vectorization and Attention-Based Refinement

Nicolas Floquet, Joseph Le Roux, Nadi Tomeh, Thierry Charnois

TL;DR

The paper tackles scalability in graph-based dependency parsing by replacing separate arc and label scoring with an arc-centric vector representation that unifies scoring within a single network. It introduces a Transformer-based refinement over arc vectors to emulate higher-order dependencies while a filtering mechanism keeps attention memory tractable. Experiments on PTB and UD demonstrate improved accuracy and competitive speed, with state-of-the-art results on PTB and strong gains across most UD languages. This arc-vector framework enables better parameter sharing and can extend to other structured prediction tasks.

Abstract

We propose a novel architecture for graph-based dependency parsing that explicitly constructs vectors, from which both arcs and labels are scored. Our method addresses key limitations of the standard two-pipeline approach by unifying arc scoring and labeling into a single network, reducing scalability issues caused by the information bottleneck and lack of parameter sharing. Additionally, our architecture overcomes limited arc interactions with transformer layers to efficiently simulate higher-order dependencies. Experiments on PTB and UD show that our model outperforms state-of-the-art parsers in both accuracy and efficiency.

Scaling Graph-Based Dependency Parsing with Arc Vectorization and Attention-Based Refinement

TL;DR

Abstract

Paper Structure (33 sections, 5 equations, 13 figures, 4 tables)

This paper contains 33 sections, 5 equations, 13 figures, 4 tables.

Introduction
Model
Standard Model
Single Pipeline Model
Refining with Attention
Filtered Attention
Experiments
Data
Evaluation
Models
Main Results
Related Work
Conclusion
Limitations
Ethical Considerations
...and 18 more sections

Figures (13)

Figure 1: Illustration of both models. Left: standard model with 2 (resp. 3) pipelines for Loc (resp. CRF2O) with shared word embeddings. Right: our proposal with a single pipeline and optionally $P$ transformers.
Figure 2: French error rates for words where one system has at least three times the error rate of another.
Figure 3: English error rates for words where one system has at least three times the error rate of another.
Figure 4: French error rates by attachment distance.
Figure 5: English error rates by attachment distance.
...and 8 more figures

Scaling Graph-Based Dependency Parsing with Arc Vectorization and Attention-Based Refinement

TL;DR

Abstract

Scaling Graph-Based Dependency Parsing with Arc Vectorization and Attention-Based Refinement

Authors

TL;DR

Abstract

Table of Contents

Figures (13)