Massively Parallel Computation of Similarity Matrices from Piecewise Constant Invariants
Björn H. Wehlin
TL;DR
The paper addresses scalable computation of similarity and inner-product matrices for large collections of piecewise constant functions (PCFs) using rectangle iteration to avoid fixed grids and enable linear-time, allocation-free calculations. By formalizing PCFs with combinations, reductions, and functionals, it supports integrated combination matrices and PCF integrals to produce distance and Gram matrices, with time-dependent and weighted variants. Key contributions include a reduction-tree approach with memory-reuse via a reduction accumulator, a GPU-accelerated masspcf implementation, and multidimensional PCF array support, achieving practical scalability on multi-GPU hardware (e.g., 500k PCFs across 8 GPUs in about 423 seconds). These advances enable large-scale PCF-based analyses in fields like TDA and computational statistics, providing a robust, high-performance toolkit for pairwise similarity computations at machine precision.
Abstract
We present a computational framework for piecewise constant functions (PCFs) and use this for several types of computations that are useful in statistics, e.g., averages, similarity matrices, and so on. We give a linear-time, allocation-free algorithm for working with pairs of PCFs at machine precision. From this, we derive algorithms for computing reductions of several PCFs. The algorithms have been implemented in a highly scalable fashion for parallel execution on CPU and, in some cases, (multi-)GPU, and are provided in a \proglang{Python} package. In addition, we provide support for multidimensional arrays of PCFs and vectorized operations on these. As a stress test, we have computed a distance matrix from 500,000 PCFs using 8 GPUs.
