Table of Contents
Fetching ...

In-place accumulation of fast multiplication formulae

Jean-Guillaume Dumas, Bruno Grenet

TL;DR

The paper tackles the challenge of performing fast, sub-quadratic/bilinear computations in-place when the output is also an input. It introduces a reversible in-place model that allows temporary input modification and proves a generic transformation from bilinear formulae to in-place accumulating variants, extendable to general linear accumulation. The authors develop concrete building blocks, including an optimal in-place Strassen-Winograd-like matrix multiply (7 recursive calls, 18 additions) and in-place Karatsuba/Toom polynomial multiplications, plus FFT-based in-place polynomial multiplication. These constructions yield first accumulating in-place algorithms with practical sub-cubic and near-linear polynomial-time performance, broadening the scope of cache-friendly, memory-efficient algebraic computation. The results have potential impact on symbolic computation, algebraic software, and memory-constrained numerical kernels by enabling large fast computations without extra temporary storage.

Abstract

This paper deals with simultaneously fast and in-place algorithms for formulae where the result has to be linearly accumulated: some of the output variables are also input variables, linked by a linear dependency. Fundamental examples include the in-place accumulated multiplication of polynomials or matrices, C+=AB. The difficulty is to combine in-place computations with fast algorithms: those usually come at the expense of (potentially large) extra temporary space, but with accumulation the output variables are not even available to store intermediate values. We first propose a novel automatic design of fast and in-place accumulating algorithms for any bilinear formulae (and thus for polynomial and matrix multiplication) and then extend it to any linear accumulation of a collection of functions. For this, we relax the in-place model to any algorithm allowed to modify its inputs, provided that those are restored to their initial state afterwards. This allows us, in fine, to derive unprecedented in-place accumulating algorithms for fast polynomial multiplications and for Strassen-like matrix multiplications.

In-place accumulation of fast multiplication formulae

TL;DR

The paper tackles the challenge of performing fast, sub-quadratic/bilinear computations in-place when the output is also an input. It introduces a reversible in-place model that allows temporary input modification and proves a generic transformation from bilinear formulae to in-place accumulating variants, extendable to general linear accumulation. The authors develop concrete building blocks, including an optimal in-place Strassen-Winograd-like matrix multiply (7 recursive calls, 18 additions) and in-place Karatsuba/Toom polynomial multiplications, plus FFT-based in-place polynomial multiplication. These constructions yield first accumulating in-place algorithms with practical sub-cubic and near-linear polynomial-time performance, broadening the scope of cache-friendly, memory-efficient algebraic computation. The results have potential impact on symbolic computation, algebraic software, and memory-constrained numerical kernels by enabling large fast computations without extra temporary storage.

Abstract

This paper deals with simultaneously fast and in-place algorithms for formulae where the result has to be linearly accumulated: some of the output variables are also input variables, linked by a linear dependency. Fundamental examples include the in-place accumulated multiplication of polynomials or matrices, C+=AB. The difficulty is to combine in-place computations with fast algorithms: those usually come at the expense of (potentially large) extra temporary space, but with accumulation the output variables are not even available to store intermediate values. We first propose a novel automatic design of fast and in-place accumulating algorithms for any bilinear formulae (and thus for polynomial and matrix multiplication) and then extend it to any linear accumulation of a collection of functions. For this, we relax the in-place model to any algorithm allowed to modify its inputs, provided that those are restored to their initial state afterwards. This allows us, in fine, to derive unprecedented in-place accumulating algorithms for fast polynomial multiplications and for Strassen-like matrix multiplications.
Paper Structure (11 sections, 13 theorems, 17 equations, 3 tables, 9 algorithms)

This paper contains 11 sections, 13 theorems, 17 equations, 3 tables, 9 algorithms.

Key Result

Theorem 3

alg:bilin is correct, in-place, and requires $t$MUL, $2(\#\alpha+\#\beta+\#\mu)-5t$ADD and $2(\sharp\alpha+\sharp\beta+\sharp\mu)$SCA operations.

Theorems & Definitions (30)

  • Example 1
  • Remark 2
  • Theorem 3
  • proof
  • Remark 4
  • Theorem 5
  • Lemma 6
  • proof
  • Lemma 7
  • proof
  • ...and 20 more