Leveraging Grammar Induction for Language Understanding and Generation

Jushi Kai; Shengyuan Hou; Yusheng Huang; Zhouhan Lin

Leveraging Grammar Induction for Language Understanding and Generation

Jushi Kai, Shengyuan Hou, Yusheng Huang, Zhouhan Lin

TL;DR

This work introduces an unsupervised grammar induction method for language understanding and generation by constructing a grammar parser to induce constituency structures and dependency relations, which is simultaneously trained on downstream tasks without additional syntax annotations.

Abstract

Grammar induction has made significant progress in recent years. However, it is not clear how the application of induced grammar could enhance practical performance in downstream tasks. In this work, we introduce an unsupervised grammar induction method for language understanding and generation. We construct a grammar parser to induce constituency structures and dependency relations, which is simultaneously trained on downstream tasks without additional syntax annotations. The induced grammar features are subsequently incorporated into Transformer as a syntactic mask to guide self-attention. We evaluate and apply our method to multiple machine translation tasks and natural language understanding tasks. Our method demonstrates superior performance compared to the original Transformer and other models enhanced with external parsers. Experimental results indicate that our method is effective in both from-scratch and pre-trained scenarios. Additionally, our research highlights the contribution of explicitly modeling the grammatical structure of texts to neural network models.

Leveraging Grammar Induction for Language Understanding and Generation

TL;DR

Abstract

Paper Structure (30 sections, 12 equations, 2 figures, 8 tables, 1 algorithm)

This paper contains 30 sections, 12 equations, 2 figures, 8 tables, 1 algorithm.

Introduction
Preliminary
Syntactic Distance and Height
Syntactic distance
Syntactic height
Dependency Distribution
Method
Syntactic Mask
Syntax-giuded Attention
BPE Embedding
Loss Function
Machine Translation
Datasets
Experiment Settings
Results
...and 15 more sections

Figures (2)

Figure 1: The pipeline for the construction of our syntactic mask. Word embeddings and BPE embeddings are utilized to induce the intermediate grammar features $s$, which are subsequently used to derive the syntactic distance $\tau$ and height $h$. The two vectors are leveraged to estimate the dependency distribution for the sentence, and generate the syntactic mask $P_D$. The mask is then employed to guide the self-attention mechanism within the encoders. These parsing modules are integrated into the Transformer model and trained together in downstream tasks.
Figure 2: The constituency tree for the example sentence by using (a) Stanford CoreNLP, (b) our method without BPE embeddings, and (c) our method with BPE embeddings. "@@" in "gentle@@" and "bak@@" is the sign of BPE segmentation.

Leveraging Grammar Induction for Language Understanding and Generation

TL;DR

Abstract

Leveraging Grammar Induction for Language Understanding and Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (2)