Category Theory for Supercomputing: The Tensor Product of Linear BSP Algorithms
Thomas Koopman, Rob H. Bisseling, Sven-Bodo Scholz
TL;DR
This work addresses how to construct parallel algorithms for tensor products of linear functions within the Bulk Synchronous Parallel (BSP) model by introducing linear BSP algorithms and a tensor-product recipe. The authors establish that, using distributions, computation, and communication steps, one can derive a BSP algorithm for $f_1 \otimes \cdots \otimes f_d$ from linear BSP algorithms for each $f_i$, relying on the distributivity of $\otimes$ over $\oplus$ and the functorial nature of the construction. They apply the framework to the Discrete Fourier Transform (DFT) and the Discrete Cosine Transform (DCT-II), deriving higher-dimensional DFTs and parallel DCTs and detailing linear BSP decompositions and tensor-product extensions. The significance lies in providing a compositional, category-theoretic perspective that yields scalable, structured parallel transforms and suggests broader applicability to HPC algorithms and potentially quantum-inspired constructions.
Abstract
We show that a particular class of parallel algorithm for linear functions can be straightforwardly generalized to a parallel algorithm of their tensor product. The central idea is to take a model of parallel algorithms -- Bulk Synchronous Parallel (BSP) -- that decomposes parallel algorithms into so-called supersteps that are one of two types: a computation superstep that only does local computations, or a communication superstep that only communicates between processors. We connect each type of supersteps to linear algebra with functors. Each superstep in isolation is simple enough to compute their tensor product in Vect with well-known techniques of linear algebra. We then individually translate the tensor product of supersteps back to the language of BSP algorithms. The functoriality of the tensor product allows us to compose the supersteps back into a BSP algorithm for the tensor product of the original function. We state the recipe for creating these new algorithms with only a minimal amount of algebra, so that it can be applied without understanding the category-theoretic details. We have previously used this to derive an efficient algorithm for the higher-dimensional Discrete Fourier Transform, which we use as an example throughout this paper. We also derive a parallel algorithm for the Discrete Cosine Transform to demonstrate the generality of our approach.
