Table of Contents
Fetching ...

Tree-Transformer: A Transformer-Based Method for Correction of Tree-Structured Data

Jacob Harer, Chris Reale, Peter Chin

TL;DR

This work introduces the Tree-Transformer, a Transformer-based architecture that operates directly on tree-structured data by replacing the Transformer feed-forward sublayers with a Parent-Sibling Tree Convolution Block. It applies this architecture to two correction tasks: code repair using Abstract Syntax Trees and grammatical error correction using Constituency Parse Trees, achieving substantial improvements over sequential baselines and state-of-the-art performance on benchmarks such as CoNLL 2014 and AESW. Key contributions include a novel tree-aware attention mechanism with top-down encoding/decoding, depth-first generation of tree structures, and effective training strategies (pretraining, denoising autoencoding, and edit-weighted losses) that yield strong recall and high F0.5 scores. The results demonstrate the practical benefits of explicitly modeling tree structure for grammar-aware correction and suggest broader applicability to tree- and graph-to-graph translation tasks.

Abstract

Many common sequential data sources, such as source code and natural language, have a natural tree-structured representation. These trees can be generated by fitting a sequence to a grammar, yielding a hierarchical ordering of the tokens in the sequence. This structure encodes a high degree of syntactic information, making it ideal for problems such as grammar correction. However, little work has been done to develop neural networks that can operate on and exploit tree-structured data. In this paper we present the Tree-Transformer \textemdash{} a novel neural network architecture designed to translate between arbitrary input and output trees. We applied this architecture to correction tasks in both the source code and natural language domains. On source code, our model achieved an improvement of $25\%$ $\text{F}0.5$ over the best sequential method. On natural language, we achieved comparable results to the most complex state of the art systems, obtaining a $10\%$ improvement in recall on the CoNLL 2014 benchmark and the highest to date $\text{F}0.5$ score on the AESW benchmark of $50.43$.

Tree-Transformer: A Transformer-Based Method for Correction of Tree-Structured Data

TL;DR

This work introduces the Tree-Transformer, a Transformer-based architecture that operates directly on tree-structured data by replacing the Transformer feed-forward sublayers with a Parent-Sibling Tree Convolution Block. It applies this architecture to two correction tasks: code repair using Abstract Syntax Trees and grammatical error correction using Constituency Parse Trees, achieving substantial improvements over sequential baselines and state-of-the-art performance on benchmarks such as CoNLL 2014 and AESW. Key contributions include a novel tree-aware attention mechanism with top-down encoding/decoding, depth-first generation of tree structures, and effective training strategies (pretraining, denoising autoencoding, and edit-weighted losses) that yield strong recall and high F0.5 scores. The results demonstrate the practical benefits of explicitly modeling tree structure for grammar-aware correction and suggest broader applicability to tree- and graph-to-graph translation tasks.

Abstract

Many common sequential data sources, such as source code and natural language, have a natural tree-structured representation. These trees can be generated by fitting a sequence to a grammar, yielding a hierarchical ordering of the tokens in the sequence. This structure encodes a high degree of syntactic information, making it ideal for problems such as grammar correction. However, little work has been done to develop neural networks that can operate on and exploit tree-structured data. In this paper we present the Tree-Transformer \textemdash{} a novel neural network architecture designed to translate between arbitrary input and output trees. We applied this architecture to correction tasks in both the source code and natural language domains. On source code, our model achieved an improvement of over the best sequential method. On natural language, we achieved comparable results to the most complex state of the art systems, obtaining a improvement in recall on the CoNLL 2014 benchmark and the highest to date score on the AESW benchmark of .

Paper Structure

This paper contains 24 sections, 4 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Tree-Transformer model architecture.
  • Figure 2: Tree-Transformer State Transfer
  • Figure 3: Example Constituency Parse Tree. The index of the node in depth-first ordering is shown in the bottom left of each node. Note: leaf nodes in the verb phrase do not have access to leaf nodes in the left noun phrase without self-attention