A note on Neuberger's double pass algorithm
Ting-Wai Chiu, Tung-Han Hsieh
TL;DR
The paper analyzes Neuberger's double-pass algorithm for computing $R(H^2)\,Y$ to approximate $(H^2)^{-1/2}Y$ in the overlap Dirac operator framework. It proves that the double pass has a flop count $F_2$ essentially independent of the degree $n$ of the rational approximation when the lattice volume is large, enabling high-precision results with large $n$ at negligible extra cost. It identifies thresholds $n_T$ (about $12$–$25$ on common platforms) where the double pass becomes faster than the single pass, and a larger $n_F$ (≈$59$ in their tests) where the double pass is advantageous in CPU time. Numerical tests across architectures corroborate the theory, showing substantial speedups around 25–31% at moderate $n$, and demonstrating that the double pass preserves chiral symmetry without significant cost as $n$ grows, making it a favorable method for lattice QCD computations that require accurate sign-function evaluations.
Abstract
We analyze Neuberger's double pass algorithm for the matrix-vector multiplication R(H).Y (where R(H) is (n-1,n)-th degree rational polynomial of positive definite operator H), and show that the number of floating point operations is independent of the degree n, provided that the number of sites is much larger than the number of iterations in the conjugate gradient. This implies that the matrix-vector product $ (H)^{-1/2} Y \simeq R^{(n-1,n)}(H) \cdot Y $ can be approximated to very high precision with sufficiently large n, without noticeably extra costs. Further, we show that there exists a threshold $ n_T $ such that the double pass is faster than the single pass for $ n > n_T $, where $ n_T \simeq 12 - 25 $ for most platforms.
