A Categorical Unification for Multi-Model Data: Part I Categorical Model and Normal Forms
Jiaheng Lu
TL;DR
The paper proposes a unified categorical framework to manage multi-model data (relation, XML, graph) by modeling databases as a thin set category enriched with pullbacks, pushouts, and limits. It introduces two reduced representations, 1RR and 2RR, to connect category-based schemas with traditional normal forms: 1RR yields BCNF and XML NF, while 2RR achieves 4NF (and a graph-equivalent normalization). It provides FD/MVD closure algorithms within categories and mapping algorithms from categorical schemas to relational, XML, and graph schemas, establishing a cross-model normal form theory. The work aims to enable lossless, redundancy-minimized representations across data models and to support future cross-model query processing and optimization in multi-model databases.
Abstract
Modern database systems face a significant challenge in effectively handling the Variety of data. The primary objective of this paper is to establish a unified data model and theoretical framework for multi-model data management. To achieve this, we present a categorical framework to unify three types of structured or semi-structured data: relation, XML, and graph-structured data. Utilizing the language of category theory, our framework offers a sound formal abstraction for representing these diverse data types. We extend the Entity-Relationship (ER) diagram with enriched semantic constraints, incorporating categorical ingredients such as pullback, pushout and limit. Furthermore, we develop a categorical normal form theory which is applied to category data to reduce redundancy and facilitate data maintenance. Those normal forms are applicable to relation, XML and graph data simultaneously, thereby eliminating the need for ad-hoc, model-specific definitions as found in separated normal form theories before. Finally, we discuss the connections between this new normal form framework and Boyce-Codd normal form, fourth normal form, and XML normal form.
