Fast Evaluation of Truncated Neumann Series by Low-Product Radix Kernels

Piyush Sao

Fast Evaluation of Truncated Neumann Series by Low-Product Radix Kernels

Piyush Sao

TL;DR

This work advances the efficient evaluation of truncated Neumann series by introducing exact radix-9 and approximate radix-15 kernels to reduce matrix-product counts in dense settings. It develops a general residual-based radix-kernel framework that accommodates spillover, preserving convergence while achieving a best-known asymptotic rate of about 1.54 products per doubling of the series length. The radix-9 kernel delivers a 21% improvement over binary splitting with exact rational coefficients, while the radix-15 approach attains the same 25% product savings in practice albeit with a small residual floor due to spillover. Together, these results offer practical pathways to faster inverse-approximation and polynomial preconditioning, with clear guidelines for selecting radix and handling nonideal kernels across matrix-iteration tasks.

Abstract

Truncated Neumann series $S_k(A)=I+A+\cdots+A^{k-1}$ are used in approximate matrix inversion and polynomial preconditioning. In dense settings, matrix-matrix products dominate the cost of evaluating $S_k$. Naive evaluation needs $k-1$ products, while splitting methods reduce this to $O(\log k)$. Repeated squaring, for example, uses $2\log_2 k$ products, so further gains require higher-radix kernels that extend the series by $m$ terms per update. Beyond the known radix-5 kernel, explicit higher-radix constructions were not available, and the existence of exact rational kernels was unclear. We construct radix kernels for $T_m(B)=I+B+\cdots+B^{m-1}$ and use them to build faster series algorithms. For radix 9, we derive an exact 3-product kernel with rational coefficients, which is the first exact construction beyond radix 5. This kernel yields $5\log_9 k=1.58\log_2 k$ products, a 21% reduction from repeated squaring. For radix 15, numerical optimization yields a 4-product kernel that matches the target through degree 14 but has nonzero spillover (extra terms) at degrees $\ge 15$. Because spillover breaks the standard telescoping update, we introduce a residual-based radix-kernel framework that accommodates approximate kernels and retains coefficient $(μ_m+2)/\log_2 m$. Within this framework, radix 15 attains $6/\log_2 15\approx 1.54$, the best known asymptotic rate. Numerical experiments support the predicted product-count savings and associated runtime trends.

Fast Evaluation of Truncated Neumann Series by Low-Product Radix Kernels

TL;DR

Abstract

Truncated Neumann series

are used in approximate matrix inversion and polynomial preconditioning. In dense settings, matrix-matrix products dominate the cost of evaluating

. Naive evaluation needs

products, while splitting methods reduce this to

. Repeated squaring, for example, uses

products, so further gains require higher-radix kernels that extend the series by

terms per update. Beyond the known radix-5 kernel, explicit higher-radix constructions were not available, and the existence of exact rational kernels was unclear. We construct radix kernels for

and use them to build faster series algorithms. For radix 9, we derive an exact 3-product kernel with rational coefficients, which is the first exact construction beyond radix 5. This kernel yields

products, a 21% reduction from repeated squaring. For radix 15, numerical optimization yields a 4-product kernel that matches the target through degree 14 but has nonzero spillover (extra terms) at degrees

. Because spillover breaks the standard telescoping update, we introduce a residual-based radix-kernel framework that accommodates approximate kernels and retains coefficient

. Within this framework, radix 15 attains

, the best known asymptotic rate. Numerical experiments support the predicted product-count savings and associated runtime trends.

Paper Structure (43 sections, 7 theorems, 17 equations, 2 figures, 5 tables)

This paper contains 43 sections, 7 theorems, 17 equations, 2 figures, 5 tables.

Introduction
Background
Splitting Methods
Optimal radix for naive splitting.
Radix Kernels
Cost model.
Lower bound.
Prior Work: Quinary (Radix-5) Kernel
Open questions.
Radix-9 Kernel
Derivation strategy.
Approximate Radix-15 Kernel
Search methodology.
Circuit Structure
Optimization and Results
...and 28 more sections

Key Result

Theorem 3.1

The kernel $T_9(B) = I + B + \cdots + B^8$ is computed by:

Figures (2)

Figure 1: Convergence comparison ($d=64$, $\kappa(I-A) = 10^{4}$, log-spaced eigenvalues). Higher-radix methods reach machine precision with fewer matrix products.
Figure 2: Asymptotic coefficients (matrix products per $\log_2 k$) for each method. Lower is better. Radix-15 achieves the best rate at $1.54$.

Theorems & Definitions (19)

Theorem 3.1: Radix-9 kernel in 3 products
proof
Remark 3.2: Rational coefficients
Remark 4.1: Non-uniqueness
Definition 5.1: Approximate radix-$m$ kernel
Definition 5.2: Error map
Lemma 5.3: Error order
proof
Definition 5.4: General radix-kernel summation
Lemma 5.5: Composition identity
...and 9 more

Fast Evaluation of Truncated Neumann Series by Low-Product Radix Kernels

TL;DR

Abstract

Fast Evaluation of Truncated Neumann Series by Low-Product Radix Kernels

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (19)