Approximate all-pairs Hamming distances and 0-1 matrix multiplication
Miroslaw Kowaluk, Andrzej Lingas, Mia Persson
TL;DR
The paper studies the computational relationship between all-pairs Hamming distances and arithmetic 0-1 matrix multiplication, establishing a linear-time reduction in both directions to unify their complexities. It then delivers fast randomized algorithms for approximating all-pairs Hamming distances via dimension reduction, which in turn yields efficient approximations for arithmetic 0-1 matrix products. Building on these tools, the authors present an output-sensitive MST algorithm in generalized Hamming spaces and develop $(2+\epsilon)$-approximation algorithms for $\ell$-center and minimum-diameter $\ell$-clustering in high-dimensional Hamming spaces. These results collectively enable substantially faster processing of high-dimensional data for clustering and MST tasks, with broad implications for hierarchical clustering and related applications in bioinformatics and machine learning.
Abstract
Arslan showed that computing all-pairs Hamming distances is easily reducible to arithmetic 0-1 matrix multiplication (IPL 2018). We provide a reverse, linear-time reduction of arithmetic 0-1 matrix multiplication to computing all-pairs distances in a Hamming space. On the other hand, we present a fast randomized algorithm for approximate all-pairs distances in a Hamming space. By combining it with our reduction, we obtain also a fast randomized algorithm for approximate 0-1 matrix multiplication. Next, we present an output-sensitive randomized algorithm for a minimum spanning tree of a set of points in a generalized Hamming space, the lower is the cost of the minimum spanning tree the faster is our algorithm. Finally, we provide $(2+ε)$- approximation algorithms for the $\ell$-center clustering and minimum-diameter $\ell$-clustering problems in a Hamming space $\{0,1\}^d$ that are substantially faster than the known $2$-approximation ones when both $\ell$ and $d$ are super-logarithmic.
