Categorial grammars with unique category assignment

Maxim Vishnikin; Alexander Okhotin

Categorial grammars with unique category assignment

Maxim Vishnikin, Alexander Okhotin

TL;DR

The paper studies categorial grammars with unique category assignment, showing that restricting each symbol to a single category does not limit context-free expressivity because every context-free language $L\subseteq\Sigma^+$ can be encoded homomorphically: there exists an alphabet $\Omega$ and a homomorphism $h: \Sigma \to \Omega^+$ such that $w\in L$ iff $h(w)$ lies in a language defined by a unique-category grammar $G$ with a target $S$. The construction introduces intricate encodings (e.g., $z_{A,B}$, $u_{A,B}$, $y_{t,B}$, $w_{A_1,...,A_n}$) and a $\phi$-encoding with $\phi(p)=p/p$, then wraps encodings with sentinel symbols to prevent cross-contamination, enabling the simulation of multiple syntactic roles. Using these tools, the authors show how to map each terminal $a$ in a Greibach-normal-form CFG to a string $h(a)$ that encodes all rules involving $a$, and prove the main lemma: a word $y$ is in the language defined by the CFG for nonterminal $X$ iff $h(y)$ reduces to $\phi(X)$. Consequently, Greibach’s hardest-language theorem admits a hardest language definable by a unique-category grammar, and the framework even yields inherently ambiguous languages within this class. The results invite further exploration of the unique-category restriction in other categorial formalisms and raise questions about the boundaries of unambiguous vs. inherently ambiguous parses in this setting.

Abstract

A categorial grammar assigns one of several syntactic categories to each symbol of the alphabet, and the category of a string is then deduced from the categories assigned to its symbols using two simple reduction rules. This paper investigates a special class of categorial grammars, in which only one category is assigned to each symbol, thus eliminating ambiguity on the lexical level (in linguistic terms, a unique part of speech is assigned to each word). While unrestricted categorial grammars are equivalent to the context-free grammars, the proposed subclass initially appears weak, as it cannot define even some regular languages. It is proved in the paper that it is actually powerful enough to define a homomorphic encoding of every context-free language, in the sense that for every context-free language $L$ over an alphabet $Σ$ there is a language $L'$ over some alphabet $Ω$ defined by categorial grammar with unique category assignment and a homomorphism $h \colon Σ\to Ω^+$, such that a string $w$ is in $L$ if and only if $h(w)$ is in $L'$. In particular, in Greibach's hardest context-free language theorem, it is sufficient to use a hardest language defined by a categorial grammar with unique category assignment.

Categorial grammars with unique category assignment

TL;DR

can be encoded homomorphically: there exists an alphabet

and a homomorphism

such that

iff

lies in a language defined by a unique-category grammar

with a target

. The construction introduces intricate encodings (e.g.,

) and a

-encoding with

, then wraps encodings with sentinel symbols to prevent cross-contamination, enabling the simulation of multiple syntactic roles. Using these tools, the authors show how to map each terminal

in a Greibach-normal-form CFG to a string

that encodes all rules involving

, and prove the main lemma: a word

is in the language defined by the CFG for nonterminal

iff

reduces to

. Consequently, Greibach’s hardest-language theorem admits a hardest language definable by a unique-category grammar, and the framework even yields inherently ambiguous languages within this class. The results invite further exploration of the unique-category restriction in other categorial formalisms and raise questions about the boundaries of unambiguous vs. inherently ambiguous parses in this setting.

Abstract

over an alphabet

there is a language

over some alphabet

defined by categorial grammar with unique category assignment and a homomorphism

, such that a string

is in

if and only if

is in

. In particular, in Greibach's hardest context-free language theorem, it is sufficient to use a hardest language defined by a categorial grammar with unique category assignment.

Categorial grammars with unique category assignment

TL;DR

Abstract

Categorial grammars with unique category assignment

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (48)