Table of Contents
Fetching ...

Categorial grammars with unique category assignment

Maxim Vishnikin, Alexander Okhotin

TL;DR

The paper studies categorial grammars with unique category assignment, showing that restricting each symbol to a single category does not limit context-free expressivity because every context-free language $L\subseteq\Sigma^+$ can be encoded homomorphically: there exists an alphabet $\Omega$ and a homomorphism $h: \Sigma \to \Omega^+$ such that $w\in L$ iff $h(w)$ lies in a language defined by a unique-category grammar $G$ with a target $S$. The construction introduces intricate encodings (e.g., $z_{A,B}$, $u_{A,B}$, $y_{t,B}$, $w_{A_1,...,A_n}$) and a $\phi$-encoding with $\phi(p)=p/p$, then wraps encodings with sentinel symbols to prevent cross-contamination, enabling the simulation of multiple syntactic roles. Using these tools, the authors show how to map each terminal $a$ in a Greibach-normal-form CFG to a string $h(a)$ that encodes all rules involving $a$, and prove the main lemma: a word $y$ is in the language defined by the CFG for nonterminal $X$ iff $h(y)$ reduces to $\phi(X)$. Consequently, Greibach’s hardest-language theorem admits a hardest language definable by a unique-category grammar, and the framework even yields inherently ambiguous languages within this class. The results invite further exploration of the unique-category restriction in other categorial formalisms and raise questions about the boundaries of unambiguous vs. inherently ambiguous parses in this setting.

Abstract

A categorial grammar assigns one of several syntactic categories to each symbol of the alphabet, and the category of a string is then deduced from the categories assigned to its symbols using two simple reduction rules. This paper investigates a special class of categorial grammars, in which only one category is assigned to each symbol, thus eliminating ambiguity on the lexical level (in linguistic terms, a unique part of speech is assigned to each word). While unrestricted categorial grammars are equivalent to the context-free grammars, the proposed subclass initially appears weak, as it cannot define even some regular languages. It is proved in the paper that it is actually powerful enough to define a homomorphic encoding of every context-free language, in the sense that for every context-free language $L$ over an alphabet $Σ$ there is a language $L'$ over some alphabet $Ω$ defined by categorial grammar with unique category assignment and a homomorphism $h \colon Σ\to Ω^+$, such that a string $w$ is in $L$ if and only if $h(w)$ is in $L'$. In particular, in Greibach's hardest context-free language theorem, it is sufficient to use a hardest language defined by a categorial grammar with unique category assignment.

Categorial grammars with unique category assignment

TL;DR

The paper studies categorial grammars with unique category assignment, showing that restricting each symbol to a single category does not limit context-free expressivity because every context-free language can be encoded homomorphically: there exists an alphabet and a homomorphism such that iff lies in a language defined by a unique-category grammar with a target . The construction introduces intricate encodings (e.g., , , , ) and a -encoding with , then wraps encodings with sentinel symbols to prevent cross-contamination, enabling the simulation of multiple syntactic roles. Using these tools, the authors show how to map each terminal in a Greibach-normal-form CFG to a string that encodes all rules involving , and prove the main lemma: a word is in the language defined by the CFG for nonterminal iff reduces to . Consequently, Greibach’s hardest-language theorem admits a hardest language definable by a unique-category grammar, and the framework even yields inherently ambiguous languages within this class. The results invite further exploration of the unique-category restriction in other categorial formalisms and raise questions about the boundaries of unambiguous vs. inherently ambiguous parses in this setting.

Abstract

A categorial grammar assigns one of several syntactic categories to each symbol of the alphabet, and the category of a string is then deduced from the categories assigned to its symbols using two simple reduction rules. This paper investigates a special class of categorial grammars, in which only one category is assigned to each symbol, thus eliminating ambiguity on the lexical level (in linguistic terms, a unique part of speech is assigned to each word). While unrestricted categorial grammars are equivalent to the context-free grammars, the proposed subclass initially appears weak, as it cannot define even some regular languages. It is proved in the paper that it is actually powerful enough to define a homomorphic encoding of every context-free language, in the sense that for every context-free language over an alphabet there is a language over some alphabet defined by categorial grammar with unique category assignment and a homomorphism , such that a string is in if and only if is in . In particular, in Greibach's hardest context-free language theorem, it is sufficient to use a hardest language defined by a categorial grammar with unique category assignment.

Paper Structure

This paper contains 7 sections, 12 theorems, 59 equations.

Key Result

Theorem A

A language $L \subseteq \Sigma^+$ is defined by a categorial grammar if and only if it is defined by a context-free grammar.

Theorems & Definitions (48)

  • Definition 1
  • Definition 2: Ajdukiewicz Ajdukiewicz1935, Bar-Hillel et al. BarhillelGaifmanShamir
  • Definition 3
  • Theorem A: Bar-Hillel et al. BarhillelGaifmanShamir
  • Definition 4
  • Proposition 1
  • Example 1
  • Example 2
  • proof
  • Example 3
  • ...and 38 more