Table of Contents
Fetching ...

Butterfly factorization with error guarantees

Quoc-Tung Le, Léon Zheng, Elisa Riccietti, Rémi Gribonval

TL;DR

This work develops a general theory for deformable butterfly factorization by introducing Kronecker-sparse factors and the chainability concept, which guarantees the existence of an optimum and enables provable error bounds. It presents a hierarchical factorization algorithm and a refined butterfly algorithm with orthonormalization that achieves a quasi-optimal approximation to the target matrix relative to the best possible butterfly factorization, with a computable constant ${C_{\boldsymbol{\beta}}}$ depending only on the architecture. The framework unifies several known butterfly architectures (e.g., square dyadic, Monarch, deformable butterfly) and extends error guarantees to arbitrary chainable patterns, including generalized CLR properties. Numerical experiments demonstrate fast, accurate approximations that surpass gradient-based methods in speed and resist noise when orthonormalization is used. The results offer practical guidance for secure fast linear operator evaluation in applications requiring structured matrices with fixed sparsity, and open avenues for handling unknown permutations and hardware-aware implementations.

Abstract

In this paper, we investigate the butterfly factorization problem, i.e., the problem of approximating a matrix by a product of sparse and structured factors. We propose a new formal mathematical description of such factors, that encompasses many different variations of butterfly factorization with different choices of the prescribed sparsity patterns. Among these supports, we identify those that ensure that the factorization problem admits an optimum, thanks to a new property called ``chainability''. For those supports we propose a new butterfly algorithm that yields an approximate solution to the butterfly factorization problem and that is supported by stronger theoretical guarantees than existing factorization methods. Specifically, we show that the ratio of the approximation error by the minimum value is bounded by a constant, independent of the target matrix.

Butterfly factorization with error guarantees

TL;DR

This work develops a general theory for deformable butterfly factorization by introducing Kronecker-sparse factors and the chainability concept, which guarantees the existence of an optimum and enables provable error bounds. It presents a hierarchical factorization algorithm and a refined butterfly algorithm with orthonormalization that achieves a quasi-optimal approximation to the target matrix relative to the best possible butterfly factorization, with a computable constant depending only on the architecture. The framework unifies several known butterfly architectures (e.g., square dyadic, Monarch, deformable butterfly) and extends error guarantees to arbitrary chainable patterns, including generalized CLR properties. Numerical experiments demonstrate fast, accurate approximations that surpass gradient-based methods in speed and resist noise when orthonormalization is used. The results offer practical guidance for secure fast linear operator evaluation in applications requiring structured matrices with fixed sparsity, and open avenues for handling unknown permutations and hardware-aware implementations.

Abstract

In this paper, we investigate the butterfly factorization problem, i.e., the problem of approximating a matrix by a product of sparse and structured factors. We propose a new formal mathematical description of such factors, that encompasses many different variations of butterfly factorization with different choices of the prescribed sparsity patterns. Among these supports, we identify those that ensure that the factorization problem admits an optimum, thanks to a new property called ``chainability''. For those supports we propose a new butterfly algorithm that yields an approximate solution to the butterfly factorization problem and that is supported by stronger theoretical guarantees than existing factorization methods. Specifically, we show that the ratio of the approximation error by the minimum value is bounded by a constant, independent of the target matrix.

Paper Structure

This paper contains 62 sections, 46 theorems, 128 equations, 9 figures, 2 tables, 5 algorithms.

Key Result

Theorem 3.4

\newlabeltheorem:tractablefsmf0 If all components $\mathbf{U}_i$ of $\varphi(\mathbf{L},\mathbf{R})$ are pairwise disjoint or identical, then algo:algorithm1 yields an optimal solution of Problem eq:FSMF, and the infimum of Problem eq:FSMF is : whereNote that $\mathbf{L}\mathbf{R}$ is a product of two binary matrices.$c := \sum_{(i,j) \notin \mathtt{supp}(\mathbf{L}\mathbf{R})} \mathbf{A}[i,j]^2

Figures (9)

  • Figure 1: Illustration of the support of a factor with pattern ${\boldsymbol{\pi}}=(a,b,c,d)$. The colored squares indicate the indices belonging to the support. The sub-figures (1), (2), (3) illustrate respectively the concepts of factor, block and sub-block.
  • Figure 1: An example of support constraints $(\mathbf{L},\mathbf{R})$ and the supports of the corresponding rank-one contributions. Colored parts indicate indices inside the support constraints $\mathbf{L}, \mathbf{R}$ and $\mathbf{U}_i$ for $i \in \llbracket 3 \rrbracket$. $\{1,2\}$ and $\{3\}$ are the two equivalence classes (\ref{['def:classequivalence']}).
  • Figure 1: Relative approximation errors defined as $\|\mathbf{A} - \hat{\mathbf{A}}\|_F / \|\mathbf{A}\|_F$vs. running time of the different algorithms. The target matrix $\mathbf{A}$ is the Hadamard matrix of size $1024 \times 1024$, and $\hat{\mathbf{A}}$ is the computed approximation for Problem \ref{['eq:butterfly-approximation-pb']} associated with the square dyadic butterfly architecture.
  • Figure 1: The partition $\mathcal{P}_{\mathtt{col}}({\boldsymbol{\pi}}, r)$ with ${\boldsymbol{\pi}} = (2, 3, 4, 2)$ and $r = 2$. Each set $P_{t,k}, (t,k) \in \llbracket 4 \rrbracket \times \llbracket 2 \rrbracket$ gathers the indices of the columns of $\mathbf{S}_{\boldsymbol{\pi}}$ of a given color. See also \ref{['fig:DBfactorillu']}. Same color indicates columns of the same sets $P_{t,k}$ (cf. \ref{['def:formula-partition']}).
  • Figure 1: Illustration of the complementary low-rank property associated to the square dyadic butterfly architecture ${\boldsymbol{\beta}} = ({\boldsymbol{\pi}}_\ell)_{\ell=1}^L$ for matrices of size $n \times n$ with $n=16$ and $L=\log_2(n)=4$. For each $\ell \in \llbracket L-1 \rrbracket$, we represent the partition of the matrix indices $\llbracket n \rrbracket \times \llbracket n \rrbracket$ into $\{ R_P \times C_P, \, P \in \mathcal{P}(\mathbf{S}_{{\boldsymbol{\pi}}_1 * \ldots * {\boldsymbol{\pi}}_\ell},\mathbf{S}_{{\boldsymbol{\pi}}_{\ell+1} * \ldots * {\boldsymbol{\pi}}_L} ) \}$ using different colors, where each color represent one element of the partition.
  • ...and 4 more figures

Theorems & Definitions (113)

  • Definition 3.1: Rank-one contribution supports le2022spuriouszheng2023efficient
  • Remark 3.2
  • Definition 3.3: Equivalence classes of rank-one supports, representative rank-one supports le2022spurious
  • Theorem 3.4: Tractable support constraints of Problem \ref{['eq:FSMF']} le2022spurious
  • Proof 1: Sketch of proof
  • Remark 3.5
  • Definition 4.1: Kronecker-sparse factors and their sparsity patterns
  • Example 4.2
  • Lemma 4.3: Sparsity level of a $\boldsymbol{\pi}$-factor
  • Remark 4.4
  • ...and 103 more