Deep Biaffine Attention for Neural Dependency Parsing
Timothy Dozat, Christopher D. Manning
TL;DR
The paper advances neural dependency parsing by adopting a deep biaffine attention mechanism with dimensionality-reducing MLPs, applied to a larger, well-regularized graph-based parser. This approach yields state-of-the-art or near state-of-the-art results across multiple languages, notably achieving 95.7 UAS and 94.1 LAS on English PTB while maintaining the simplicity of graph-based methods. Through extensive hyperparameter analysis, the authors demonstrate how architecture choices, regularization, and optimizer settings significantly impact parsing accuracy and speed. The work narrows the performance gap with transition-based parsers and demonstrates practical benefits for multilingual parsing tasks, with future work addressing labeled/unlabeled accuracy and OOV handling.
Abstract
This paper builds off recent work from Kiperwasser & Goldberg (2016) using neural attention in a simple graph-based dependency parser. We use a larger but more thoroughly regularized parser than other recent BiLSTM-based approaches, with biaffine classifiers to predict arcs and labels. Our parser gets state of the art or near state of the art performance on standard treebanks for six different languages, achieving 95.7% UAS and 94.1% LAS on the most popular English PTB dataset. This makes it the highest-performing graph-based parser on this benchmark---outperforming Kiperwasser Goldberg (2016) by 1.8% and 2.2%---and comparable to the highest performing transition-based parser (Kuncoro et al., 2016), which achieves 95.8% UAS and 94.6% LAS. We also show which hyperparameter choices had a significant effect on parsing accuracy, allowing us to achieve large gains over other graph-based approaches.
