Table of Contents
Fetching ...

Deep Biaffine Attention for Neural Dependency Parsing

Timothy Dozat, Christopher D. Manning

TL;DR

The paper advances neural dependency parsing by adopting a deep biaffine attention mechanism with dimensionality-reducing MLPs, applied to a larger, well-regularized graph-based parser. This approach yields state-of-the-art or near state-of-the-art results across multiple languages, notably achieving 95.7 UAS and 94.1 LAS on English PTB while maintaining the simplicity of graph-based methods. Through extensive hyperparameter analysis, the authors demonstrate how architecture choices, regularization, and optimizer settings significantly impact parsing accuracy and speed. The work narrows the performance gap with transition-based parsers and demonstrates practical benefits for multilingual parsing tasks, with future work addressing labeled/unlabeled accuracy and OOV handling.

Abstract

This paper builds off recent work from Kiperwasser & Goldberg (2016) using neural attention in a simple graph-based dependency parser. We use a larger but more thoroughly regularized parser than other recent BiLSTM-based approaches, with biaffine classifiers to predict arcs and labels. Our parser gets state of the art or near state of the art performance on standard treebanks for six different languages, achieving 95.7% UAS and 94.1% LAS on the most popular English PTB dataset. This makes it the highest-performing graph-based parser on this benchmark---outperforming Kiperwasser Goldberg (2016) by 1.8% and 2.2%---and comparable to the highest performing transition-based parser (Kuncoro et al., 2016), which achieves 95.8% UAS and 94.6% LAS. We also show which hyperparameter choices had a significant effect on parsing accuracy, allowing us to achieve large gains over other graph-based approaches.

Deep Biaffine Attention for Neural Dependency Parsing

TL;DR

The paper advances neural dependency parsing by adopting a deep biaffine attention mechanism with dimensionality-reducing MLPs, applied to a larger, well-regularized graph-based parser. This approach yields state-of-the-art or near state-of-the-art results across multiple languages, notably achieving 95.7 UAS and 94.1 LAS on English PTB while maintaining the simplicity of graph-based methods. Through extensive hyperparameter analysis, the authors demonstrate how architecture choices, regularization, and optimizer settings significantly impact parsing accuracy and speed. The work narrows the performance gap with transition-based parsers and demonstrates practical benefits for multilingual parsing tasks, with future work addressing labeled/unlabeled accuracy and OOV handling.

Abstract

This paper builds off recent work from Kiperwasser & Goldberg (2016) using neural attention in a simple graph-based dependency parser. We use a larger but more thoroughly regularized parser than other recent BiLSTM-based approaches, with biaffine classifiers to predict arcs and labels. Our parser gets state of the art or near state of the art performance on standard treebanks for six different languages, achieving 95.7% UAS and 94.1% LAS on the most popular English PTB dataset. This makes it the highest-performing graph-based parser on this benchmark---outperforming Kiperwasser Goldberg (2016) by 1.8% and 2.2%---and comparable to the highest performing transition-based parser (Kuncoro et al., 2016), which achieves 95.8% UAS and 94.6% LAS. We also show which hyperparameter choices had a significant effect on parsing accuracy, allowing us to achieve large gains over other graph-based approaches.

Paper Structure

This paper contains 13 sections, 3 equations, 1 figure, 5 tables.

Figures (1)

  • Figure 1: BiLSTM with deep biaffine attention to score each possible head for each dependent, applied to the sentence "Casey hugged Kim". We reverse the order of the biaffine transformation here for clarity.