Table of Contents
Fetching ...

Faster Algorithms for Structured Matrix Multiplication via Flip Graph Search

Kirill Khoruzhii, Patrick Gelß, Sebastian Pokutta

TL;DR

This work introduces a flip-graph search framework to systematically discover low-rank, non-commutative bilinear schemes for structured matrix multiplication at small base sizes, then lifts these schemes to integers or rationals. By cataloging 15 structured formats and employing Hensel lifting from $\mathbb{F}_2$ and $\mathbb{F}_3$, the authors obtain thousands of schemes and prove improvements in asymptotic constants for 13 formats, including notable gains for $AA^{\top}$ and triangular-general products. A detailed methodology combines tensor decompositions with recursive call counting and corner-specific strategies for transpose products, yielding explicit base schemes such as a $4\times4$, rank-34 $AA^{\top}$ scheme with $\gamma_{\mathtt{gt}}=22/37$. The results demonstrate the efficacy of flip-graph guided searches for structured bilinear computations and provide a publicly available implementation and data release to facilitate future extensions and practical adoption.

Abstract

We give explicit low-rank bilinear non-commutative schemes for multiplying structured $n \times n$ matrices with $2 \leq n \leq 5$, which serve as building blocks for recursive algorithms with improved multiplicative factors in asymptotic complexity. Our schemes are discovered over $\mathbb{F}_2$ or $\mathbb{F}_3$ and lifted to $\mathbb{Z}$ or $\mathbb{Q}$. Using a flip graph search over tensor decompositions, we derive schemes for general, upper-triangular, lower-triangular, symmetric, and skew-symmetric inputs, as well as products of a structured matrix with its transpose. These schemes improve asymptotic constants for 13 of 15 structured formats. In particular, we obtain $4 \times 4$ rank-34 schemes for both multiplying a general matrix by its transpose and an upper-triangular matrix by a general matrix, improving the asymptotic factor from 8/13 (0.615) to 22/37 (0.595). Additionally, using $\mathbb{F}_3$ flip graphs, we discover schemes over $\mathbb{Q}$ that fundamentally require the inverse of 2, including a $2 \times 2$ symmetric-symmetric multiplication of rank 5 and a $3 \times 3$ skew-symmetric-general multiplication of rank 14 (improving upon AlphaTensor's 15).

Faster Algorithms for Structured Matrix Multiplication via Flip Graph Search

TL;DR

This work introduces a flip-graph search framework to systematically discover low-rank, non-commutative bilinear schemes for structured matrix multiplication at small base sizes, then lifts these schemes to integers or rationals. By cataloging 15 structured formats and employing Hensel lifting from and , the authors obtain thousands of schemes and prove improvements in asymptotic constants for 13 formats, including notable gains for and triangular-general products. A detailed methodology combines tensor decompositions with recursive call counting and corner-specific strategies for transpose products, yielding explicit base schemes such as a , rank-34 scheme with . The results demonstrate the efficacy of flip-graph guided searches for structured bilinear computations and provide a publicly available implementation and data release to facilitate future extensions and practical adoption.

Abstract

We give explicit low-rank bilinear non-commutative schemes for multiplying structured matrices with , which serve as building blocks for recursive algorithms with improved multiplicative factors in asymptotic complexity. Our schemes are discovered over or and lifted to or . Using a flip graph search over tensor decompositions, we derive schemes for general, upper-triangular, lower-triangular, symmetric, and skew-symmetric inputs, as well as products of a structured matrix with its transpose. These schemes improve asymptotic constants for 13 of 15 structured formats. In particular, we obtain rank-34 schemes for both multiplying a general matrix by its transpose and an upper-triangular matrix by a general matrix, improving the asymptotic factor from 8/13 (0.615) to 22/37 (0.595). Additionally, using flip graphs, we discover schemes over that fundamentally require the inverse of 2, including a symmetric-symmetric multiplication of rank 5 and a skew-symmetric-general multiplication of rank 14 (improving upon AlphaTensor's 15).

Paper Structure

This paper contains 11 sections, 32 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Strassen’s decomposition rank-$7$ for $2\times2$ matrix multiplication. a) Matrix product written as a bilinear contraction. b) The seven intermediate products. c) Output reconstructed entries. d) Tensor view: $T_{ijk}$ expressed as a sum of seven rank-1 terms; each term corresponds to one intermediate product $m_\ell$ in (b) and encodes how it contributes to each output $c_k$ in (c).
  • Figure 2: Decomposition rank-$17$ for $3 \times 3$ matrix multiplication $G G^{\textnormal{T}}$. a) Block layout $C=GG^{\textnormal{T}}$ with $G$ partitioned into $3\times3$ blocks $A_1,\ldots,A_9$; by symmetry only six distinct blocks $C_1,\ldots,C_6$ should be calculated. b) General products $m_1,\ldots,m_8$. c) Nine recursive calls $r_1,\ldots,r_9$ computing the symmetric block products $A_iA_i^{\textnormal{T}}$. d) Reconstruction of $C_1,\ldots,C_6$ from the $m$’s and $r$’s. e) Tensor view: contraction tensor $T_{ijk}$ expressed as a sum of rank-1 terms.
  • Figure 3: Flip graph structure and operations. a) Three types of transformations between tensor decompositions: flip (blue) modifies two rank-1 terms sharing a common factor, preserving the total rank; reduction (red) eliminates one term when two rank-1 terms share two common factors; plus-transition (purple) combines an inverse reduction with a flip to escape local plateaus. b) The flip graph organizes schemes by rank. Vertices represent correct matrix multiplication schemes (shown as sum of rank-1 tensors $u_i \otimes v_i \otimes w_i$). Horizontal edges (blue) correspond to flips within a fixed rank level. Vertical edges (red) correspond to reductions that decrease rank. Some connected components at rank $r-1$ may have no further reductions, necessitating plus-transitions to continue descent.
  • Figure 4: $\langle 4, 4, 4 \colon 34 \rangle_\mathtt{gt}^{(12,0,0)}$. Structured matrix multiplication $C = A A^{\mathrm T}$ for a $4\times4$ block with symmetric $C$. Coefficients lie in $\mathbb{Z}$. Operation count: 34 multiplications ($12\,\mathtt{gt} + 22\,\mathtt{gg}$) and 141 additions.
  • Figure 5: $\langle 3, 3, 3 \colon 14 \rangle_\mathtt{kg}^{(0,0,0)}$. Batched cross product. Coefficients lie in $\mathbb{Q}$. Operation count: 14 multiplications and 126 additions.
  • ...and 2 more figures