From data to concepts via wiring diagrams
Jason Lo, Mohammadnima Jafari
TL;DR
This work develops a category-theoretic framework for extracting abstract, temporally structured concepts from sequential data by introducing quasi-skeleton wiring diagram (WD) graphs and proving their 1-1 correspondence with Hasse diagrams of posets via transitive reductions. It then builds practical algorithms (notably Hasse clustering) that convert sequences into WD-path matrices and cluster data around shared conceptual structures, bypassing reliance on metric similarity. The authors validate the approach on reinforcement-learning game data, recovering unique and multiple winning strategies and showing WD-based clustering can outperform standard clustering under both clean and corrupted data, with robustness and scalability limited by current hardware (roughly $m\le 5$ events). The results demonstrate a principled way to derive human-interpretable, logically constrained concepts from time-series data, with potential applications across domains. The combination of category theory, graph theory, and data-engineering yields a reproducible pipeline for concept discovery from sequential observations.
Abstract
A wiring diagram is a labeled directed graph that represents an abstract concept such as a temporal process. In this article, we introduce the notion of a quasi-skeleton wiring diagram graph, and prove that quasi-skeleton wiring diagram graphs correspond to Hasse diagrams. Using this result, we designed algorithms that extract wiring diagrams from sequential data. We used our algorithms in analyzing the behavior of an autonomous agent playing a computer game, and the algorithms correctly identified the winning strategies. We compared the performance of our main algorithm with two other algorithms based on standard clustering techniques (DBSCAN and agglomerative hierarchical), including when some of the data was perturbed. Overall, this article brings together techniques in category theory, graph theory, clustering, reinforcement learning, and data engineering.
