Table of Contents
Fetching ...

A Lower Bound on Unambiguous Context Free Grammars via Communication Complexity

Stefan Mengel, Harry Vinall-Smeeth

TL;DR

The paper addresses the problem of how succinctly finite languages can be represented by CFGs versus unambiguous CFGs. It introduces a rectangle-cover framework that ties grammar size to disjoint rectangle covers, and employs a discrepancy-based argument from communication complexity to derive exponential lower bounds for uCFG representations of $L_n$. The main result is a doubly exponential separation between general CFGs and uCFGs for $L_n$, with implications that even NFAs can be exponentially more succinct than uCFGs for finite languages. The approach advances understanding of unambiguity and suggests broader applicability to factorised representations and knowledge compilation.

Abstract

Motivated by recent connections to factorised databases, we analyse the efficiency of representations by context free grammars (CFGs). Concretely, we prove a recent conjecture by Kimelfeld, Martens, and Niewerth (ICDT 2025), that for finite languages representations by general CFGs can be doubly-exponentially smaller than those by unambiguous CFGs. To do so, we show the first exponential lower bounds for representation by unambiguous CFGs of a finite language that can efficiently be represented by CFGs. Our proof first reduces the problem to proving a lower bound in a non-standard model of communication complexity. Then, we argue similarly in spirit to a recent discrepancy argument to show the required communication complexity lower bound. Our result also implies that a finite language may admit an exponentially smaller representation as a nondeterministic finite automaton than as an unambiguous CFG.

A Lower Bound on Unambiguous Context Free Grammars via Communication Complexity

TL;DR

The paper addresses the problem of how succinctly finite languages can be represented by CFGs versus unambiguous CFGs. It introduces a rectangle-cover framework that ties grammar size to disjoint rectangle covers, and employs a discrepancy-based argument from communication complexity to derive exponential lower bounds for uCFG representations of . The main result is a doubly exponential separation between general CFGs and uCFGs for , with implications that even NFAs can be exponentially more succinct than uCFGs for finite languages. The approach advances understanding of unambiguity and suggests broader applicability to factorised representations and knowledge compilation.

Abstract

Motivated by recent connections to factorised databases, we analyse the efficiency of representations by context free grammars (CFGs). Concretely, we prove a recent conjecture by Kimelfeld, Martens, and Niewerth (ICDT 2025), that for finite languages representations by general CFGs can be doubly-exponentially smaller than those by unambiguous CFGs. To do so, we show the first exponential lower bounds for representation by unambiguous CFGs of a finite language that can efficiently be represented by CFGs. Our proof first reduces the problem to proving a lower bound in a non-standard model of communication complexity. Then, we argue similarly in spirit to a recent discrepancy argument to show the required communication complexity lower bound. Our result also implies that a finite language may admit an exponentially smaller representation as a nondeterministic finite automaton than as an unambiguous CFG.

Paper Structure

This paper contains 12 sections, 13 theorems, 39 equations, 1 figure.

Key Result

Theorem 1

For every $n\in \mathbb{N}$, there is finite language $L_n$ over a binary alphabet in which all words have length $2n$ such that:

Figures (1)

  • Figure 1: Two different parse trees for the word $aaaaaa$ for the grammar of Example \ref{['ex: ambig']}.

Theorems & Definitions (31)

  • Theorem 1
  • Definition 2
  • Example 3
  • Example 4
  • Definition 5: Rectangle
  • Example 6
  • Proposition 7
  • Example 8
  • proof
  • Lemma 10
  • ...and 21 more