Butterfly factorization with error guarantees
Quoc-Tung Le, Léon Zheng, Elisa Riccietti, Rémi Gribonval
TL;DR
This work develops a general theory for deformable butterfly factorization by introducing Kronecker-sparse factors and the chainability concept, which guarantees the existence of an optimum and enables provable error bounds. It presents a hierarchical factorization algorithm and a refined butterfly algorithm with orthonormalization that achieves a quasi-optimal approximation to the target matrix relative to the best possible butterfly factorization, with a computable constant ${C_{\boldsymbol{\beta}}}$ depending only on the architecture. The framework unifies several known butterfly architectures (e.g., square dyadic, Monarch, deformable butterfly) and extends error guarantees to arbitrary chainable patterns, including generalized CLR properties. Numerical experiments demonstrate fast, accurate approximations that surpass gradient-based methods in speed and resist noise when orthonormalization is used. The results offer practical guidance for secure fast linear operator evaluation in applications requiring structured matrices with fixed sparsity, and open avenues for handling unknown permutations and hardware-aware implementations.
Abstract
In this paper, we investigate the butterfly factorization problem, i.e., the problem of approximating a matrix by a product of sparse and structured factors. We propose a new formal mathematical description of such factors, that encompasses many different variations of butterfly factorization with different choices of the prescribed sparsity patterns. Among these supports, we identify those that ensure that the factorization problem admits an optimum, thanks to a new property called ``chainability''. For those supports we propose a new butterfly algorithm that yields an approximate solution to the butterfly factorization problem and that is supported by stronger theoretical guarantees than existing factorization methods. Specifically, we show that the ratio of the approximation error by the minimum value is bounded by a constant, independent of the target matrix.
