Around Context-Free Grammars -- a Normal Form, a Representation Theorem, and a Regular Approximation
Liliana Cojocaru
TL;DR
The paper introduces Dyck normal form as a syntactic refinement of Chomsky normal form that enforces paired, bracket-like right-hand sides, yielding nested derivation trees and a natural homomorphism to the original grammar. Building on this, it proves a representation L = φ(D'_K) via trace-words and Dyck languages, and provides a graphical, transition-based proof of the Chomsky–Schützenberger theorem by constructing dependency graphs and an extended graph that produce a regular language whose intersection with a Dyck language characterizes the CFG's language. It then develops refinements to obtain a thinner regular language R_m and outlines a practical method to derive a regular superset approximation G_r generating a language L(G_r) with L ⊆ L(G_r). The work culminates in a graphically constructive framework linking CFGs, Dyck languages, and CS theory, offering systematic, though nonoptimal, regular approximations with potential applications in parsing and language description.
Abstract
We introduce a normal form for context-free grammars, called Dyck normal form. This is a syntactical restriction of the Chomsky normal form, in which the two nonterminals occurring on the right-hand side of a rule are paired nonterminals. This pairwise property allows to define a homomorphism from Dyck words to words generated by a grammar in Dyck normal form. We prove that for each context-free language L, there exist an integer K and a homomorphism h such that L=h(D'_K), where D'_K is a subset of the one-sided Dyck language over K letters. Through a transition-like diagram for a context-free grammar in Dyck normal form, we effectively build a regular language R that satisfies the Chomsky-Schutzenberger theorem. Using graphical approaches we refine R such that the Chomsky-Schutzenberger theorem still holds. Based on this readjustment we sketch a transition diagram for a regular grammar that generates a regular superset approximation for the initial context-free language.
