Using Enriched Category Theory to Construct the Nearest Neighbour Classification Algorithm
Matthew Pugh, Jo Grundy, Corina Cirstea, Nick Harris
TL;DR
The paper addresses the lack of fully category-theoretic constructions for ML models by encoding the Nearest Neighbours Algorithm (NNA) entirely within Enriched Category Theory, using Cost as the base. It derives an Enriched Nearest Neighbours Algorithm (NNA) via profunctor composition, notably $NNA(y,x)=Cost((\mathbf{1}_{NY}\circ F^*)(y,x),(T_\ast\circ F^*)(y,x))$, and shows how V-NNA and $k$-NNA arise from enriching the framework and tensoring data representations. It further introduces label metrics by treating Y as a Lawvere metric space, enabling soft and dependent classifications through $T_*(y,i)=Y(y,Ti)$ and asymmetric relations. These contributions provide a rigorous, extensible, and potentially more explainable foundation for ML algorithms, with practical implications for generalisation and interpretability in categorically informed learning systems, including extensions to soft boundaries and hierarchy-aware classifications.
Abstract
This paper is the first to construct and motivate a Machine Learning algorithm solely with Enriched Category Theory, supplementing evidence that Category Theory can provide valuable insights into the construction and explainability of Machine Learning algorithms. It is shown that a series of reasonable assumptions about a dataset lead to the construction of the Nearest Neighbours Algorithm. This construction is produced as an extension of the original dataset using profunctors in the category of Lawvere metric spaces, leading to a definition of an Enriched Nearest Neighbours Algorithm, which, consequently, also produces an enriched form of the Voronoi diagram. Further investigation of the generalisations this construction induces demonstrates how the $k$ Nearest Neighbours Algorithm may also be produced. Moreover, how the new construction allows metrics on the classification labels to inform the outputs of the Enriched Nearest Neighbour Algorithm: Enabling soft classification boundaries and dependent classifications. This paper is intended to be accessible without any knowledge of Category Theory.
