Structural Optimization Ambiguity and Simplicity Bias in Unsupervised Neural Grammar Induction

Jinwook Park; Kangil Kim

Structural Optimization Ambiguity and Simplicity Bias in Unsupervised Neural Grammar Induction

Jinwook Park, Kangil Kim

TL;DR

This paper tackles challenges in unsupervised neural grammar induction, namely structural optimization ambiguity (SOA) and structural simplicity bias (SSB), arising when training across all possible parses. It proposes sentence-wise parse-focusing, restricting loss to a small set of parses per sentence, and bias generation from pre-trained unsupervised parsers, including heterogeneous multi-parsers, to steer parsing decisions. On PTB and multilingual benchmarks, the method yields higher parsing accuracy with reduced variance and less tendency toward overly simple parses, while promoting more diverse rule usage. This approach advances the development of compact, accurate, and interpretable explicit grammars in unsupervised settings and highlights the value of combining cross-model biases with data-driven focus.

Abstract

Neural parameterization has significantly advanced unsupervised grammar induction. However, training these models with a traditional likelihood loss for all possible parses exacerbates two issues: 1) $\textit{structural optimization ambiguity}$ that arbitrarily selects one among structurally ambiguous optimal grammars despite the specific preference of gold parses, and 2) $\textit{structural simplicity bias}$ that leads a model to underutilize rules to compose parse trees. These challenges subject unsupervised neural grammar induction (UNGI) to inevitable prediction errors, high variance, and the necessity for extensive grammars to achieve accurate predictions. This paper tackles these issues, offering a comprehensive analysis of their origins. As a solution, we introduce $\textit{sentence-wise parse-focusing}$ to reduce the parse pool per sentence for loss evaluation, using the structural bias from pre-trained parsers on the same dataset. In unsupervised parsing benchmark tests, our method significantly improves performance while effectively reducing variance and bias toward overly simplistic parses. Our research promotes learning more compact, accurate, and consistent explicit grammars, facilitating better interpretability.

Structural Optimization Ambiguity and Simplicity Bias in Unsupervised Neural Grammar Induction

TL;DR

Abstract

Neural parameterization has significantly advanced unsupervised grammar induction. However, training these models with a traditional likelihood loss for all possible parses exacerbates two issues: 1)

that arbitrarily selects one among structurally ambiguous optimal grammars despite the specific preference of gold parses, and 2)

that leads a model to underutilize rules to compose parse trees. These challenges subject unsupervised neural grammar induction (UNGI) to inevitable prediction errors, high variance, and the necessity for extensive grammars to achieve accurate predictions. This paper tackles these issues, offering a comprehensive analysis of their origins. As a solution, we introduce

to reduce the parse pool per sentence for loss evaluation, using the structural bias from pre-trained parsers on the same dataset. In unsupervised parsing benchmark tests, our method significantly improves performance while effectively reducing variance and bias toward overly simplistic parses. Our research promotes learning more compact, accurate, and consistent explicit grammars, facilitating better interpretability.

Paper Structure (61 sections, 10 equations, 10 figures, 8 tables, 1 algorithm)

This paper contains 61 sections, 10 equations, 10 figures, 8 tables, 1 algorithm.

Introduction
Background
Notations of Probabilistic Context-Free Grammar
Negative Log Likelihood Loss
Limits on Unsupervised Learning of Neural Grammar Induction
Structural Optimization Ambiguity (SOA)
Definition of Structure Optimization Ambiguity by UGI Loss
Arbitrary Convergence by Structural Optimization Ambiguity
Proof in Single Pre-terminal Condition
Empirical Evidence in General Condition: Low Correlation of Loss to S-F1
Empirical Evidence in General Condition: Variance Amplification by Symbol Size
Structural Simplicity Bias (SSB)
What is SSB?
Specific Conditions of UGI
Rule Simplification for Expressing a Parse
...and 46 more sections

Figures (10)

Figure 1: The optimization of two different grammars, $G_1$ and $G_2$, in the optimization landscape for a given sentence. Each grammar has different preferences for parse trees based on parse tree probability; however, they have the same sentence probability. The two initial points converge to the different optima close to their own.
Figure 2: (a) Low correlation between S-F1 score and UGI likelihood for PTB test set. We evaluate trained FGG-TNPCFGs with 4500 nonterminal symbols and 9000 preterminal symbols using 32 random seeds over 10 epochs. (b) Amplification of variance by symbol size increasing. Circles are the same as in (a) and others are trained with the same settings as in (a) except for the number of symbols.
Figure 3: The average number of unique rules in each parse for each sentence length. We evaluate the average for models with 5 / 15 / 30 nonterminals trained with four different random seeds. These evaluations use the WSJ train set. (a) The gold uses non-binarized gold parses. (b) Parse-focused N-PCFGs use models trained by Structformer, NBL-PCFG, and FGG-TNPCFG.
Figure 4: Overview of N-PCFG (above) and Ours (below). The red box shows the parse-focusing method.
Figure 5: Frequency distribution sorted in descending order for contained rules in the parsed training dataset. The colored fill area represents the frequency difference between FGG-TNPCFGs and our method with 5 nonterminals. The small box zooms in on the top 10 rules for differences.
...and 5 more figures

Theorems & Definitions (1)

Definition 1

Structural Optimization Ambiguity and Simplicity Bias in Unsupervised Neural Grammar Induction

TL;DR

Abstract

Structural Optimization Ambiguity and Simplicity Bias in Unsupervised Neural Grammar Induction

Authors

TL;DR

Abstract

Table of Contents

Figures (10)

Theorems & Definitions (1)