Fast in-place accumulation
Jean-Guillaume Dumas, Bruno Grenet
TL;DR
The paper addresses the longstanding space-time trade-off in fast algebraic algorithms by introducing a reversible in-place accumulation model that allows inputs to be temporarily modified and later restored. It provides a general transformation that converts any bilinear (and more generally linear accumulation) algorithm into an in-place accumulating variant, enabling fast Strassen-like matrix multiplication and fast polynomial multiplication with minimal extra memory. This framework yields in-place algorithms for a wide range of linear-algebra subroutines, Toeplitz/circulant operations, convolutions, and modular remainder/multiplication, often matching the asymptotic complexity of their not-in-place counterparts while using only one small memory footprint. The results demonstrate practical improvements and provide automatic design tools, with extensions to over-place variants and applications to polynomial extensions of finite fields, along with open questions about logarithmic overhead in some regimes.
Abstract
This paper deals with simultaneously fast and in-place algorithms for formulae where the result has to be linearly accumulated: some output variables are also input variables, linked by a linear dependency. Fundamental examples include the in-place accumulated multiplication of polynomials or matrices (that is with only $O(1)$ extra space). The difficulty is to combine in-place computations with fast algorithms: those usually come at the expense of (potentially large) extra temporary space. We first propose a novel automatic design of fast and in-place accumulating algorithms for any bilinear formulae (and thus for polynomial and matrix multiplication) and then extend it to any linear accumulation of a collection of functions. For this, we relax the in-place model to any algorithm allowed to modify its inputs, provided that those are restored to their initial state afterwards. This allows us to derive in-place accumulating algorithms for fast polynomial multiplications and for Strassen-like matrix multiplications. We then consider the simultaneously fast and in-place computation of the Euclidean polynomial remainder $R = A \bmod B$. If $A$ and $B$ have respective degree $m+n$ and $n$, and $M(k)$ denotes the complexity of a (not-in-place) algorithm to multiply two degree-$k$ polynomials, our algorithm uses at most $O((n/m) M(m)\log(m))$ arithmetic operations. If $M(n) = Θ(n^{1+ε})$ for some $ε>0$, then our algorithms do match the not-in-place complexity bound of $O((n/m) M(m))$. We also propose variants that compute - still in-place and with the same complexity bounds - $A = A \bmod B$, $R += A \bmod B$ and $R += AC \bmod B$, that is multiplication in a finite field extension. To achieve this, we develop techniques for Toeplitz matrix operations, for generalized convolutions, short product and power series division and remainder whose output is also part of the input.
