Categorial grammars with unique category assignment
Maxim Vishnikin, Alexander Okhotin
TL;DR
The paper studies categorial grammars with unique category assignment, showing that restricting each symbol to a single category does not limit context-free expressivity because every context-free language $L\subseteq\Sigma^+$ can be encoded homomorphically: there exists an alphabet $\Omega$ and a homomorphism $h: \Sigma \to \Omega^+$ such that $w\in L$ iff $h(w)$ lies in a language defined by a unique-category grammar $G$ with a target $S$. The construction introduces intricate encodings (e.g., $z_{A,B}$, $u_{A,B}$, $y_{t,B}$, $w_{A_1,...,A_n}$) and a $\phi$-encoding with $\phi(p)=p/p$, then wraps encodings with sentinel symbols to prevent cross-contamination, enabling the simulation of multiple syntactic roles. Using these tools, the authors show how to map each terminal $a$ in a Greibach-normal-form CFG to a string $h(a)$ that encodes all rules involving $a$, and prove the main lemma: a word $y$ is in the language defined by the CFG for nonterminal $X$ iff $h(y)$ reduces to $\phi(X)$. Consequently, Greibach’s hardest-language theorem admits a hardest language definable by a unique-category grammar, and the framework even yields inherently ambiguous languages within this class. The results invite further exploration of the unique-category restriction in other categorial formalisms and raise questions about the boundaries of unambiguous vs. inherently ambiguous parses in this setting.
Abstract
A categorial grammar assigns one of several syntactic categories to each symbol of the alphabet, and the category of a string is then deduced from the categories assigned to its symbols using two simple reduction rules. This paper investigates a special class of categorial grammars, in which only one category is assigned to each symbol, thus eliminating ambiguity on the lexical level (in linguistic terms, a unique part of speech is assigned to each word). While unrestricted categorial grammars are equivalent to the context-free grammars, the proposed subclass initially appears weak, as it cannot define even some regular languages. It is proved in the paper that it is actually powerful enough to define a homomorphic encoding of every context-free language, in the sense that for every context-free language $L$ over an alphabet $Σ$ there is a language $L'$ over some alphabet $Ω$ defined by categorial grammar with unique category assignment and a homomorphism $h \colon Σ\to Ω^+$, such that a string $w$ is in $L$ if and only if $h(w)$ is in $L'$. In particular, in Greibach's hardest context-free language theorem, it is sufficient to use a hardest language defined by a categorial grammar with unique category assignment.
