Using Enriched Category Theory to Construct the Nearest Neighbour Classification Algorithm

Matthew Pugh; Jo Grundy; Corina Cirstea; Nick Harris

Using Enriched Category Theory to Construct the Nearest Neighbour Classification Algorithm

Matthew Pugh, Jo Grundy, Corina Cirstea, Nick Harris

TL;DR

The paper addresses the lack of fully category-theoretic constructions for ML models by encoding the Nearest Neighbours Algorithm (NNA) entirely within Enriched Category Theory, using Cost as the base. It derives an Enriched Nearest Neighbours Algorithm (NNA) via profunctor composition, notably $NNA(y,x)=Cost((\mathbf{1}_{NY}\circ F^*)(y,x),(T_\ast\circ F^*)(y,x))$, and shows how V-NNA and $k$-NNA arise from enriching the framework and tensoring data representations. It further introduces label metrics by treating Y as a Lawvere metric space, enabling soft and dependent classifications through $T_*(y,i)=Y(y,Ti)$ and asymmetric relations. These contributions provide a rigorous, extensible, and potentially more explainable foundation for ML algorithms, with practical implications for generalisation and interpretability in categorically informed learning systems, including extensions to soft boundaries and hierarchy-aware classifications.

Abstract

This paper is the first to construct and motivate a Machine Learning algorithm solely with Enriched Category Theory, supplementing evidence that Category Theory can provide valuable insights into the construction and explainability of Machine Learning algorithms. It is shown that a series of reasonable assumptions about a dataset lead to the construction of the Nearest Neighbours Algorithm. This construction is produced as an extension of the original dataset using profunctors in the category of Lawvere metric spaces, leading to a definition of an Enriched Nearest Neighbours Algorithm, which, consequently, also produces an enriched form of the Voronoi diagram. Further investigation of the generalisations this construction induces demonstrates how the $k$ Nearest Neighbours Algorithm may also be produced. Moreover, how the new construction allows metrics on the classification labels to inform the outputs of the Enriched Nearest Neighbour Algorithm: Enabling soft classification boundaries and dependent classifications. This paper is intended to be accessible without any knowledge of Category Theory.

Using Enriched Category Theory to Construct the Nearest Neighbour Classification Algorithm

TL;DR

, and shows how V-NNA and

-NNA arise from enriching the framework and tensoring data representations. It further introduces label metrics by treating Y as a Lawvere metric space, enabling soft and dependent classifications through

and asymmetric relations. These contributions provide a rigorous, extensible, and potentially more explainable foundation for ML algorithms, with practical implications for generalisation and interpretability in categorically informed learning systems, including extensions to soft boundaries and hierarchy-aware classifications.

Abstract

Nearest Neighbours Algorithm may also be produced. Moreover, how the new construction allows metrics on the classification labels to inform the outputs of the Enriched Nearest Neighbour Algorithm: Enabling soft classification boundaries and dependent classifications. This paper is intended to be accessible without any knowledge of Category Theory.

Paper Structure (10 sections, 30 equations, 2 figures)

This paper contains 10 sections, 30 equations, 2 figures.

Introduction
Background
Nearest Neighbours Algorithm
Lawvere Metric Spaces
Functors and Profunctors
Constructing The Nearest Neighbours Algorithm
Generalising the Nearest Neighbours Algorithm
K Nearest Neighbours
Label Metrics for Soft Boundaries and Dependent Classifications
Conclusion

Figures (2)

Figure 1: An example of the classification regions produced by the nearest neighbour algorithm from data points sampled from two Gaussian distributions, representing the distributions of the two classes.
Figure 2: A plot of the values of $\textit{NNA}$ (left) and $4$-NNA (right) for $x,y\in [0,1]$. $30$ points were uniformly sampled from the interval $[0,1]$ and transformed by the function $f(x) = 0.4+0.1\sin(10x)-0.7x^2 +0.7x^3$ then randomly scaled by $\pm5\%$ to produce the $Y$ values. The hom objects where taken to be $X(a,b) = Y(a,b) = |b-a|$. The aggregation policy chosen for $4$-NNA is only zero when all tuple components agree on the class. Both colour scales clip outputs outside their stated range.

Using Enriched Category Theory to Construct the Nearest Neighbour Classification Algorithm

TL;DR

Abstract

Using Enriched Category Theory to Construct the Nearest Neighbour Classification Algorithm

Authors

TL;DR

Abstract

Table of Contents

Figures (2)