Linguistic Analysis, Description, and Typological Exploration with Categorial Grammar (TheBench Guide)
Cem Bozsahin
TL;DR
TheBench presents a composition-centric, monadic approach to categorial grammar for linguistic analysis, description, and typological exploration. By treating synthesis as composition of semantic functions and pairing surface and predicate-argument commands ($s$-command and $l$-command), it enables a bottom-up, invariant analytic framework where syntax is represented as a sequence of function applications via $f \circ g$ and $\lambda x.f(gx)$. The system supports explicit generation and manipulation of case functions, skeleton previews, and monadic data points, while offering tools for generation, training with supervision, and batch experimentation—without relying on top-down universal categories. Its emphasis on morphology as an autonomous, architecture-agnostic component, and its cycle-based workflows for analysis, exploration, and ranking, aim to bridge typological variation with compositional semantics in a flexible software environment built on Python and Common Lisp. The practical impact lies in enabling linguists to build, test, and train bottom-up grammars that reveal cross-language regularities in reference and subcategorization, while providing traceable data representations for evaluation and typological comparison.
Abstract
TheBench is a tool to study monadic structures in natural language. It is for writing monadic grammars to explore analyses, compare diverse languages through their categories, and to train models of grammar from form-meaning pairs where syntax is latent variable. Monadic structures are binary combinations of elements that employ semantics of composition only. TheBench is essentially old-school categorial grammar to syntacticize the idea, with the implication that although syntax is autonomous (recall \emph{colorless green ideas sleep furiously}), the treasure is in the baggage it carries at every step, viz. semantics, more narrowly, predicate-argument structures indicating choice of categorial reference and its consequent placeholders for decision in such structures. There is some new thought in old school. Unlike traditional categorial grammars, application is turned into composition in monadic analysis. Moreover, every correspondence requires specifying two command relations, one on syntactic command and the other on semantic command. A monadic grammar of TheBench contains only synthetic elements (called `objects' in category theory of mathematics) that are shaped by this analytic invariant, viz. composition. Both ingredients (command relations) of any analytic step must therefore be functions (`arrows' in category theory). TheBench is one implementation of the idea for iterative development of such functions along with grammar of synthetic elements.
