In-place accumulation of fast multiplication formulae
Jean-Guillaume Dumas, Bruno Grenet
TL;DR
The paper tackles the challenge of performing fast, sub-quadratic/bilinear computations in-place when the output is also an input. It introduces a reversible in-place model that allows temporary input modification and proves a generic transformation from bilinear formulae to in-place accumulating variants, extendable to general linear accumulation. The authors develop concrete building blocks, including an optimal in-place Strassen-Winograd-like matrix multiply (7 recursive calls, 18 additions) and in-place Karatsuba/Toom polynomial multiplications, plus FFT-based in-place polynomial multiplication. These constructions yield first accumulating in-place algorithms with practical sub-cubic and near-linear polynomial-time performance, broadening the scope of cache-friendly, memory-efficient algebraic computation. The results have potential impact on symbolic computation, algebraic software, and memory-constrained numerical kernels by enabling large fast computations without extra temporary storage.
Abstract
This paper deals with simultaneously fast and in-place algorithms for formulae where the result has to be linearly accumulated: some of the output variables are also input variables, linked by a linear dependency. Fundamental examples include the in-place accumulated multiplication of polynomials or matrices, C+=AB. The difficulty is to combine in-place computations with fast algorithms: those usually come at the expense of (potentially large) extra temporary space, but with accumulation the output variables are not even available to store intermediate values. We first propose a novel automatic design of fast and in-place accumulating algorithms for any bilinear formulae (and thus for polynomial and matrix multiplication) and then extend it to any linear accumulation of a collection of functions. For this, we relax the in-place model to any algorithm allowed to modify its inputs, provided that those are restored to their initial state afterwards. This allows us, in fine, to derive unprecedented in-place accumulating algorithms for fast polynomial multiplications and for Strassen-like matrix multiplications.
