Categorical Data Structures for Technical Computing
Evan Patterson, Owen Lynch, James Fairbanks
TL;DR
This work introduces acsets, attributed C-sets, as a practical in-memory data structure that unifies graphs and data frames within a rigorous categorical framework. By viewing data as functors from finitely presented categories to Set, acsets extend C-sets with typed attributes and are implemented in Julia via Catlab, enabling automatic code generation and high performance. The authors establish that acsets form a slice category, enabling limits, colimits, and data migration operations, and support structured cospans for open systems. Empirical benchmarks show acsets rival specialized graph libraries while offering broad generality for graph-like and relational objects. The approach promises a versatile, compositional foundation for technical computing with graphs, networks, and relational data, with clear pathways for future extensions to other base categories and schema-driven data structures.
Abstract
Many mathematical objects can be represented as functors from finitely-presented categories $\mathsf{C}$ to $\mathsf{Set}$. For instance, graphs are functors to $\mathsf{Set}$ from the category with two parallel arrows. Such functors are known informally as $\mathsf{C}$-sets. In this paper, we describe and implement an extension of $\mathsf{C}$-sets having data attributes with fixed types, such as graphs with labeled vertices or real-valued edge weights. We call such structures "acsets," short for "attributed $\mathsf{C}$-sets." Derived from previous work on algebraic databases, acsets are a joint generalization of graphs and data frames. They also encompass more elaborate graph-like objects such as wiring diagrams and Petri nets with rate constants. We develop the mathematical theory of acsets and then describe a generic implementation in the Julia programming language, which uses advanced language features to achieve performance comparable with specialized data structures.
