Table of Contents
Fetching ...

From Formal Language Theory to Statistical Learning: Finite Observability of Subregular Languages

Katsuhiko Hayashi, Hidetaka Kamigaito

Abstract

We prove that all standard subregular language classes are linearly separable when represented by their deciding predicates. This establishes finite observability and guarantees learnability with simple linear models. Synthetic experiments confirm perfect separability under noise-free conditions, while real-data experiments on English morphology show that learned features align with well-known linguistic constraints. These results demonstrate that the subregular hierarchy provides a rigorous and interpretable foundation for modeling natural language structure. Our code used in real-data experiments is available at https://github.com/UTokyo-HayashiLab/subregular.

From Formal Language Theory to Statistical Learning: Finite Observability of Subregular Languages

Abstract

We prove that all standard subregular language classes are linearly separable when represented by their deciding predicates. This establishes finite observability and guarantees learnability with simple linear models. Synthetic experiments confirm perfect separability under noise-free conditions, while real-data experiments on English morphology show that learned features align with well-known linguistic constraints. These results demonstrate that the subregular hierarchy provides a rigorous and interpretable foundation for modeling natural language structure. Our code used in real-data experiments is available at https://github.com/UTokyo-HayashiLab/subregular.

Paper Structure

This paper contains 44 sections, 3 theorems, 8 equations, 4 figures, 1 table.

Key Result

Lemma 3.1

For any $S\subseteq\{0,1\}^n$, there exist weights $w\in\mathbb{R}^{2^n}$ and bias $b\in\mathbb{R}$ such that Moreover, the separating hyperplane attains functional margin at least $1/2$; the geometric margin equals $1/(2\|w\|_2)$.

Figures (4)

  • Figure 1: Accuracy of synthetic classification: (A) $\mathsf{SL}_3$, (B) $\mathsf{SP}_2$ and (C) $\mathsf{LTT}_2$ under label noise, (D) $\mathsf{SL}_3$, (E) $\mathsf{SP}_2$ and (F) $\mathsf{LTT}_2$ with varying training sizes.
  • Figure 2: Confusion matrix for the morphological well-formedness classification task, showing accurate separation of well-formed (positive) and ill-formed (negative) affix sequences.
  • Figure 3: Top-weighted predicates learned by the classifier on the morphological dataset, illustrating linguistically meaningful affix constraints such as boundary-sensitive suffixes and prefix–suffix co-occurrence restrictions.
  • Figure 4: Histogram of normalized margins on the test set. Most examples lie at positive margins, confirming effective linear separation.

Theorems & Definitions (4)

  • Lemma 3.1: Minterm Linearization
  • Theorem 3.2
  • Theorem 4.1: Regular language not representable by a fixed $P$
  • proof