Fine-Grained Computation in 3-Space: Matrix Multiplication and Graph Problems
Quentin F. Stout
TL;DR
The work studies fine-grained parallel computation under physical limits in a 3D space, addressing matrix multiplication and graph/maze problems. It develops a 3D mesh framework with a lower bound of $Ω(n^{2/3})$ for general matrix products and constructs multi-step, systolic algorithms that achieve this bound (and $Θ(n^{3/4})$ when expanding to larger meshes for non-ring operations). For graphs, it shows APSP and bottleneck-path solutions in $Θ(n^{3-α} \log n)$ time on a mesh of size $n^α$ (with $2 ≤ α ≤ 9/4$), and transitive closure in $Θ(n^{2/3} \log n)$ on a $n^2$-size mesh; for 3D mazes, it provides time bounds that improve over 2D methods by leveraging larger meshes and repeated squaring, e.g., $Θ(n^{6-c} \log n)$ with $4 ≤ c ≤ 9/2$ and $Θ(n^{3/2} \log n)$ at the endpoint. These results illustrate how 3D fine-grained models unlock subcubic time for select problems while reflecting physical constraints on data movement. The findings have implications for hardware design and energy efficiency in future 3D computing architectures and raise open questions about logarithmic factors and additional path problems in 3D space.
Abstract
Obeying constraints imposed by classical physics, we give optimal fine-grained algorithms for matrix multiplication and problems involving graphs and mazes, where all calculations are done in 3-dimensional space. We assume that whatever the technology is, a bit requires a minimum volume and communication travels at a bounded speed. These imply that multiplying $n \times n$ matrices takes $Ω(n^{2/3})$ time, and we show that this can be achieved by a fine-grained 3-d mesh of $n^2$ processors. While the constants are impractically large, this is asymptotically faster than parallel implementations of Strassen's algorithm, while the lower bound shows that some claims about parallelizing faster serial algorithms are impossible in 3-space. If the matrices are not over a ring then multiplication can be done in $Θ(n^{3/4})$ time by expanding to a mesh larger than the input. In 2-d (such as the surface of a chip) this approach is useless and $Θ(n)$ systolic algorithms are optimal even when the matrices are over a ring. Similarly, for path and maze problems there are approaches useful in 3-d but not 2-d.
