Fast Evaluation of Truncated Neumann Series by Low-Product Radix Kernels
Piyush Sao
TL;DR
This work advances the efficient evaluation of truncated Neumann series by introducing exact radix-9 and approximate radix-15 kernels to reduce matrix-product counts in dense settings. It develops a general residual-based radix-kernel framework that accommodates spillover, preserving convergence while achieving a best-known asymptotic rate of about 1.54 products per doubling of the series length. The radix-9 kernel delivers a 21% improvement over binary splitting with exact rational coefficients, while the radix-15 approach attains the same 25% product savings in practice albeit with a small residual floor due to spillover. Together, these results offer practical pathways to faster inverse-approximation and polynomial preconditioning, with clear guidelines for selecting radix and handling nonideal kernels across matrix-iteration tasks.
Abstract
Truncated Neumann series $S_k(A)=I+A+\cdots+A^{k-1}$ are used in approximate matrix inversion and polynomial preconditioning. In dense settings, matrix-matrix products dominate the cost of evaluating $S_k$. Naive evaluation needs $k-1$ products, while splitting methods reduce this to $O(\log k)$. Repeated squaring, for example, uses $2\log_2 k$ products, so further gains require higher-radix kernels that extend the series by $m$ terms per update. Beyond the known radix-5 kernel, explicit higher-radix constructions were not available, and the existence of exact rational kernels was unclear. We construct radix kernels for $T_m(B)=I+B+\cdots+B^{m-1}$ and use them to build faster series algorithms. For radix 9, we derive an exact 3-product kernel with rational coefficients, which is the first exact construction beyond radix 5. This kernel yields $5\log_9 k=1.58\log_2 k$ products, a 21% reduction from repeated squaring. For radix 15, numerical optimization yields a 4-product kernel that matches the target through degree 14 but has nonzero spillover (extra terms) at degrees $\ge 15$. Because spillover breaks the standard telescoping update, we introduce a residual-based radix-kernel framework that accommodates approximate kernels and retains coefficient $(μ_m+2)/\log_2 m$. Within this framework, radix 15 attains $6/\log_2 15\approx 1.54$, the best known asymptotic rate. Numerical experiments support the predicted product-count savings and associated runtime trends.
