Table of Contents
Fetching ...

Around Context-Free Grammars -- a Normal Form, a Representation Theorem, and a Regular Approximation

Liliana Cojocaru

TL;DR

The paper introduces Dyck normal form as a syntactic refinement of Chomsky normal form that enforces paired, bracket-like right-hand sides, yielding nested derivation trees and a natural homomorphism to the original grammar. Building on this, it proves a representation L = φ(D'_K) via trace-words and Dyck languages, and provides a graphical, transition-based proof of the Chomsky–Schützenberger theorem by constructing dependency graphs and an extended graph that produce a regular language whose intersection with a Dyck language characterizes the CFG's language. It then develops refinements to obtain a thinner regular language R_m and outlines a practical method to derive a regular superset approximation G_r generating a language L(G_r) with L ⊆ L(G_r). The work culminates in a graphically constructive framework linking CFGs, Dyck languages, and CS theory, offering systematic, though nonoptimal, regular approximations with potential applications in parsing and language description.

Abstract

We introduce a normal form for context-free grammars, called Dyck normal form. This is a syntactical restriction of the Chomsky normal form, in which the two nonterminals occurring on the right-hand side of a rule are paired nonterminals. This pairwise property allows to define a homomorphism from Dyck words to words generated by a grammar in Dyck normal form. We prove that for each context-free language L, there exist an integer K and a homomorphism h such that L=h(D'_K), where D'_K is a subset of the one-sided Dyck language over K letters. Through a transition-like diagram for a context-free grammar in Dyck normal form, we effectively build a regular language R that satisfies the Chomsky-Schutzenberger theorem. Using graphical approaches we refine R such that the Chomsky-Schutzenberger theorem still holds. Based on this readjustment we sketch a transition diagram for a regular grammar that generates a regular superset approximation for the initial context-free language.

Around Context-Free Grammars -- a Normal Form, a Representation Theorem, and a Regular Approximation

TL;DR

The paper introduces Dyck normal form as a syntactic refinement of Chomsky normal form that enforces paired, bracket-like right-hand sides, yielding nested derivation trees and a natural homomorphism to the original grammar. Building on this, it proves a representation L = φ(D'_K) via trace-words and Dyck languages, and provides a graphical, transition-based proof of the Chomsky–Schützenberger theorem by constructing dependency graphs and an extended graph that produce a regular language whose intersection with a Dyck language characterizes the CFG's language. It then develops refinements to obtain a thinner regular language R_m and outlines a practical method to derive a regular superset approximation G_r generating a language L(G_r) with L ⊆ L(G_r). The work culminates in a graphically constructive framework linking CFGs, Dyck languages, and CS theory, offering systematic, though nonoptimal, regular approximations with potential applications in parsing and language description.

Abstract

We introduce a normal form for context-free grammars, called Dyck normal form. This is a syntactical restriction of the Chomsky normal form, in which the two nonterminals occurring on the right-hand side of a rule are paired nonterminals. This pairwise property allows to define a homomorphism from Dyck words to words generated by a grammar in Dyck normal form. We prove that for each context-free language L, there exist an integer K and a homomorphism h such that L=h(D'_K), where D'_K is a subset of the one-sided Dyck language over K letters. Through a transition-like diagram for a context-free grammar in Dyck normal form, we effectively build a regular language R that satisfies the Chomsky-Schutzenberger theorem. Using graphical approaches we refine R such that the Chomsky-Schutzenberger theorem still holds. Based on this readjustment we sketch a transition diagram for a regular grammar that generates a regular superset approximation for the initial context-free language.

Paper Structure

This paper contains 6 sections, 9 theorems, 1 equation, 5 figures.

Key Result

Theorem 1.2

For each context-free grammar $G=(N, T, P, S)$ there exists a grammar $G'=(N', T, P', S)$ such that $L(G)=L(G')$ where $G'$ is in Dyck normal form.

Figures (5)

  • Figure 1: a. The dependency graph ${\cal G}^S$ of grammar $G$ in Example 1. b. The extended dependency graph of $G$. Edges colored in orange extend $\cal G$ to ${\cal G}_e$. c. The transition diagram ${\cal A}_e$ (see Example 5.1 a.) built from ${\cal G}_e$. Each bracket $[_i$ ($S$, $]_i$) in ${\cal A}_e$ corresponds to state $s_{[_i}$ ($s_S$, $s_{]_i}$). In all graphs $S$ is the initial vertex. In a. - b. the vertex colored in blue is the final vertex.
  • Figure 2: a. - d. The dependency graphs of the context-free grammar $G$ in Example 3.7. e. The extended dependency graph of $G$. In all graphs, vertices colored in red are initial vertices, while vertices colored in blue are final vertices. Edges colored in orange, in $d.$ emphasize symmetrical structures obtained by linking the dependency graphs between them.
  • Figure 3: a. - e. Graphs associated with regular expressions in ${\cal P}.e$ (Example 4.2). Initial vertices are colored in red, final vertices in blue, while purple vertices mark a core segment. $\bar{]}^4_7$ is a marked vertex to allow the plus-loop $([^4_3]^4_7)^+$.
  • Figure 4: The refined dependency graph of the context-free grammar in Examples 3.7 and 4.2. $S$ is the initial vertex, vertices colored in green are final vertices, vertices colored in blue are dummy vertices, vertices colored in purple mark a core segment. Orange edges emphasize symetrical structures built with respect to the structure of the trace language. Green edges are glue edges.
  • Figure 5: The transition diagram ${\cal A}_e$ built from ${\cal G}.e^S$ in Example 4.2. Each bracket $[_i$ ($S$, $]_i$) in ${\cal A}_e$ corresponds to the state $s_{[_i}$ ($s_S$, $s_{]_i}$) (see Example 5.1 b.). $S$ is the initial vertex, vertices colored in green lead to the final state.

Theorems & Definitions (22)

  • Definition 1.1
  • Theorem 1.2
  • Corollary 1.3
  • Corollary 1.4
  • Example 1.5
  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Lemma 2.4
  • Lemma 2.5
  • ...and 12 more