Table of Contents
Fetching ...

Generalized cyclic symmetric decompositions for the matrix multiplication tensor

Charlotte Vermeylen, Marc Van Barel

TL;DR

The paper tackles the challenge of discovering fast matrix multiplication algorithms by casting matrix multiplication as a canonical polyadic decomposition of the matrix multiplication tensor $T_{mpn}$ and addressing the non-uniqueness and large-parameter search space via a generalized cyclic symmetric structure in the factor matrices. This CS-based structure is integrated into an augmented Lagrangian solver and extended to non-equal modes through a generalized CS formulation, plus recursive and subblock CS strategies to further reduce parameters. Empirical results across multiple tensors show that the generalized CS approach yields more exact and practically useful decompositions, including sparse PDs with entries in $\\{0, \\pm 1\}$. The work thus improves convergence, expands the set of practical CPDs, and enhances the practical discovery of efficient FMM algorithms with potential hardware-friendly properties.

Abstract

A new generalized cyclic symmetric structure in the factor matrices of polyadic decompositions of matrix multiplication tensors for non-square matrix multiplication is proposed to reduce the number of variables in the optimization problem and in this way improve the convergence. The structure is implemented in an existing numerical optimization algorithm. Extensive numerical experiments are given that the proposed structure indeed finds more (practical) decompositions.

Generalized cyclic symmetric decompositions for the matrix multiplication tensor

TL;DR

The paper tackles the challenge of discovering fast matrix multiplication algorithms by casting matrix multiplication as a canonical polyadic decomposition of the matrix multiplication tensor and addressing the non-uniqueness and large-parameter search space via a generalized cyclic symmetric structure in the factor matrices. This CS-based structure is integrated into an augmented Lagrangian solver and extended to non-equal modes through a generalized CS formulation, plus recursive and subblock CS strategies to further reduce parameters. Empirical results across multiple tensors show that the generalized CS approach yields more exact and practically useful decompositions, including sparse PDs with entries in . The work thus improves convergence, expands the set of practical CPDs, and enhances the practical discovery of efficient FMM algorithms with potential hardware-friendly properties.

Abstract

A new generalized cyclic symmetric structure in the factor matrices of polyadic decompositions of matrix multiplication tensors for non-square matrix multiplication is proposed to reduce the number of variables in the optimization problem and in this way improve the convergence. The structure is implemented in an existing numerical optimization algorithm. Extensive numerical experiments are given that the proposed structure indeed finds more (practical) decompositions.
Paper Structure (13 sections, 2 theorems, 40 equations, 2 figures, 6 tables)

This paper contains 13 sections, 2 theorems, 40 equations, 2 figures, 6 tables.

Key Result

Proposition 2.1

If $U$, $V$, and $W$ are the factor matrices of a $\mathrm{PD}_r(T_{mpn})$, then the factor matrices $U', V'$, and $W'$, where where '$\otimes$' denotes the Kronecker product, for $i_1,i_2 = 1, \dots, r$, are the factor matrices of a $\mathrm{PD}_{r^2}\left(T_{m^2p^2n^2} \right)$.

Figures (2)

  • Figure 1: Illustration of the generalized CS factor matrices of a $\mathrm{PD}_{20}\left(T_{234}\right)$, with $(s,t):=(2,2)$.
  • Figure 2: Convergence of the cost function in \ref{['eq: unconstr_LSQ_min']} using the AL method from vermeylen2023stability with $u:=-l:=1$ and for 50 random starting points generated with $\hbox{\tt{randn}}$ of size $10^{-2}$ to find $\mathrm{PD}_{11}\left(T_{223}\right)$s, with $(s,t):=(7,0)$.

Theorems & Definitions (3)

  • Proposition 2.1: Recursive PD
  • Corollary 2.2
  • proof