Table of Contents
Fetching ...

Improving the Leading Constant of Matrix Multiplication

Josh Alman, Hantao Yu

Abstract

Algebraic matrix multiplication algorithms are designed by bounding the rank of matrix multiplication tensors, and then using a recursive method. However, designing algorithms in this way quickly leads to large constant factors: if one proves that the tensor for multiplying $n \times n$ matrices has rank $\leq t$, then the resulting recurrence shows that $M \times M$ matrices can be multiplied using $O(n^2 \cdot M^{\log_n t})$ operations, where the leading constant scales proportionally to $n^2$. Even modest increases in $n$ can blow up the leading constant too much to be worth the slight decrease in the exponent of $M$. Meanwhile, the asymptotically best algorithms use very large $n$, such that $n^2$ is larger than the number of atoms in the visible universe! In this paper, we give new ways to use tensor rank bounds to design matrix multiplication algorithms, which lead to smaller leading constants than the standard recursive method. Our main result shows that, if the tensor for multiplying $n \times n$ matrices has rank $\leq t$, then $M \times M$ matrices can be multiplied using only $n^{O(1/(\log n)^{0.33})} \cdot M^{\log_n t}$ operations. In other words, we improve the leading constant in general from $O(n^2)$ to $n^{O(1/(\log n)^{0.33})} < n^{o(1)}$. We then apply this and further improve the leading constant in a number of situations of interest. We show that, in the popularly-conjectured case where $ω=2$, a new, different recursive approach can lead to an improvement. We also show that the leading constant of the current asymptotically fastest matrix multiplication algorithm, and any algorithm designed using the group-theoretic method, can be further improved by taking advantage of additional structure of the underlying tensor identities.

Improving the Leading Constant of Matrix Multiplication

Abstract

Algebraic matrix multiplication algorithms are designed by bounding the rank of matrix multiplication tensors, and then using a recursive method. However, designing algorithms in this way quickly leads to large constant factors: if one proves that the tensor for multiplying matrices has rank , then the resulting recurrence shows that matrices can be multiplied using operations, where the leading constant scales proportionally to . Even modest increases in can blow up the leading constant too much to be worth the slight decrease in the exponent of . Meanwhile, the asymptotically best algorithms use very large , such that is larger than the number of atoms in the visible universe! In this paper, we give new ways to use tensor rank bounds to design matrix multiplication algorithms, which lead to smaller leading constants than the standard recursive method. Our main result shows that, if the tensor for multiplying matrices has rank , then matrices can be multiplied using only operations. In other words, we improve the leading constant in general from to . We then apply this and further improve the leading constant in a number of situations of interest. We show that, in the popularly-conjectured case where , a new, different recursive approach can lead to an improvement. We also show that the leading constant of the current asymptotically fastest matrix multiplication algorithm, and any algorithm designed using the group-theoretic method, can be further improved by taking advantage of additional structure of the underlying tensor identities.

Paper Structure

This paper contains 35 sections, 42 theorems, 173 equations, 1 figure.

Key Result

Theorem 2.1

For any $\varepsilon > 0$, there is a function $c : \mathbb{N} \to \mathbb{R}_{>0}$ with $c(n) \leq n^{1/O((\log n)^{\frac{1}{3}-\varepsilon})} < n^{o(1)}$ such that: For any positive integers $n,t$ with $n>1$, and any field $\mathbb{F}$, given a matrix multiplication tensor $\langle n,n,n\rangle$ w

Figures (1)

  • Figure 1: An example of a linear algorithm. The algorithm takes in $x_1,x_2,x_3$ as inputs and outputs $4x_1+2x_2,3x_1+3x_2+2x_3$. The matrix associated with it is $420332$.

Theorems & Definitions (91)

  • Theorem 2.1
  • Theorem 2.2
  • Theorem 2.3
  • Theorem 2.4
  • Corollary 2.5
  • Corollary 2.6
  • Definition 4.1: Tensor
  • Definition 4.2: Kronecker Product of Tensors
  • Definition 4.3: Tensor Rank
  • Definition 4.4: Border Rank
  • ...and 81 more