A Categorical Unification for Multi-Model Data: Part II Categorical Algebra and Calculus
Jiaheng Lu
TL;DR
The paper develops a category-theoretic foundation for querying heterogeneous data by introducing two formal languages, $categorical\ calculus$ (declarative) and $categorical\ algebra$ (procedural), and proves their expressive equivalence.It defines a rich operator suite—including Map, Project, Select, getReach, and Lim—covering set, tree, and graph data, and shows how trees (via Dewey codes) and graphs can be integrated into a single framework.A central equivalence theorem demonstrates that every calculus query can be expressed algebraically and vice versa, with a comprehensive set of algebraic transformation rules to optimize queries, plus complexity bounds $O(q\cdot n^p)$ in data terms and $NSPACE[\log n]$ for space.The work enables holistic, multi-model query planning and suggests extensions to shortest-path and aggregation queries, aiming toward practical, optimized query engines for unified categorical databases.
Abstract
Multi-model databases are designed to store, manage, and query data in various models, such as relational, hierarchical, and graph data, simultaneously. In this paper, we provide a theoretical basis for querying categorical databases. We propose two formal query languages: categorical calculus and categorical algebra, by extending relational calculus and relational algebra respectively. We demonstrate the equivalence between these two languages of queries. We propose a series of transformation rules of categorical algebra to facilitate query optimization. Finally, we analyze the expressive power and computation complexity for the proposed query languages.
