Table of Contents
Fetching ...

From data to concepts via wiring diagrams

Jason Lo, Mohammadnima Jafari

TL;DR

This work develops a category-theoretic framework for extracting abstract, temporally structured concepts from sequential data by introducing quasi-skeleton wiring diagram (WD) graphs and proving their 1-1 correspondence with Hasse diagrams of posets via transitive reductions. It then builds practical algorithms (notably Hasse clustering) that convert sequences into WD-path matrices and cluster data around shared conceptual structures, bypassing reliance on metric similarity. The authors validate the approach on reinforcement-learning game data, recovering unique and multiple winning strategies and showing WD-based clustering can outperform standard clustering under both clean and corrupted data, with robustness and scalability limited by current hardware (roughly $m\le 5$ events). The results demonstrate a principled way to derive human-interpretable, logically constrained concepts from time-series data, with potential applications across domains. The combination of category theory, graph theory, and data-engineering yields a reproducible pipeline for concept discovery from sequential observations.

Abstract

A wiring diagram is a labeled directed graph that represents an abstract concept such as a temporal process. In this article, we introduce the notion of a quasi-skeleton wiring diagram graph, and prove that quasi-skeleton wiring diagram graphs correspond to Hasse diagrams. Using this result, we designed algorithms that extract wiring diagrams from sequential data. We used our algorithms in analyzing the behavior of an autonomous agent playing a computer game, and the algorithms correctly identified the winning strategies. We compared the performance of our main algorithm with two other algorithms based on standard clustering techniques (DBSCAN and agglomerative hierarchical), including when some of the data was perturbed. Overall, this article brings together techniques in category theory, graph theory, clustering, reinforcement learning, and data engineering.

From data to concepts via wiring diagrams

TL;DR

This work develops a category-theoretic framework for extracting abstract, temporally structured concepts from sequential data by introducing quasi-skeleton wiring diagram (WD) graphs and proving their 1-1 correspondence with Hasse diagrams of posets via transitive reductions. It then builds practical algorithms (notably Hasse clustering) that convert sequences into WD-path matrices and cluster data around shared conceptual structures, bypassing reliance on metric similarity. The authors validate the approach on reinforcement-learning game data, recovering unique and multiple winning strategies and showing WD-based clustering can outperform standard clustering under both clean and corrupted data, with robustness and scalability limited by current hardware (roughly events). The results demonstrate a principled way to derive human-interpretable, logically constrained concepts from time-series data, with potential applications across domains. The combination of category theory, graph theory, and data-engineering yields a reproducible pipeline for concept discovery from sequential observations.

Abstract

A wiring diagram is a labeled directed graph that represents an abstract concept such as a temporal process. In this article, we introduce the notion of a quasi-skeleton wiring diagram graph, and prove that quasi-skeleton wiring diagram graphs correspond to Hasse diagrams. Using this result, we designed algorithms that extract wiring diagrams from sequential data. We used our algorithms in analyzing the behavior of an autonomous agent playing a computer game, and the algorithms correctly identified the winning strategies. We compared the performance of our main algorithm with two other algorithms based on standard clustering techniques (DBSCAN and agglomerative hierarchical), including when some of the data was perturbed. Overall, this article brings together techniques in category theory, graph theory, clustering, reinforcement learning, and data engineering.

Paper Structure

This paper contains 12 sections, 50 equations, 2 figures.

Figures (2)

  • Figure 1: The category $\mathcal{R}(J)$ where $J$ has size 4. Each node corresponds to a wiring diagram graph. The higher the color intensity of a node $H$, the higher the value of $a(H)$ in Algorithm \ref{['algo:3-v1']}.
  • Figure 2: Dendrogram for hierarchical clustering with $L^1$-norm, applied to winning episodes of version two of the game.

Theorems & Definitions (8)

  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • proof