A note on Neuberger's double pass algorithm

Ting-Wai Chiu; Tung-Han Hsieh

A note on Neuberger's double pass algorithm

Ting-Wai Chiu, Tung-Han Hsieh

TL;DR

The paper analyzes Neuberger's double-pass algorithm for computing $R(H^2)\,Y$ to approximate $(H^2)^{-1/2}Y$ in the overlap Dirac operator framework. It proves that the double pass has a flop count $F_2$ essentially independent of the degree $n$ of the rational approximation when the lattice volume is large, enabling high-precision results with large $n$ at negligible extra cost. It identifies thresholds $n_T$ (about $12$–$25$ on common platforms) where the double pass becomes faster than the single pass, and a larger $n_F$ (≈$59$ in their tests) where the double pass is advantageous in CPU time. Numerical tests across architectures corroborate the theory, showing substantial speedups around 25–31% at moderate $n$, and demonstrating that the double pass preserves chiral symmetry without significant cost as $n$ grows, making it a favorable method for lattice QCD computations that require accurate sign-function evaluations.

Abstract

We analyze Neuberger's double pass algorithm for the matrix-vector multiplication R(H).Y (where R(H) is (n-1,n)-th degree rational polynomial of positive definite operator H), and show that the number of floating point operations is independent of the degree n, provided that the number of sites is much larger than the number of iterations in the conjugate gradient. This implies that the matrix-vector product $ (H)^{-1/2} Y \simeq R^{(n-1,n)}(H) \cdot Y $ can be approximated to very high precision with sufficiently large n, without noticeably extra costs. Further, we show that there exists a threshold $ n_T $ such that the double pass is faster than the single pass for $ n > n_T $, where $ n_T \simeq 12 - 25 $ for most platforms.

A note on Neuberger's double pass algorithm

TL;DR

The paper analyzes Neuberger's double-pass algorithm for computing

to approximate

in the overlap Dirac operator framework. It proves that the double pass has a flop count

essentially independent of the degree

of the rational approximation when the lattice volume is large, enabling high-precision results with large

at negligible extra cost. It identifies thresholds

(about

–

on common platforms) where the double pass becomes faster than the single pass, and a larger

(≈

in their tests) where the double pass is advantageous in CPU time. Numerical tests across architectures corroborate the theory, showing substantial speedups around 25–31% at moderate

, and demonstrating that the double pass preserves chiral symmetry without significant cost as

grows, making it a favorable method for lattice QCD computations that require accurate sign-function evaluations.

Abstract

can be approximated to very high precision with sufficiently large n, without noticeably extra costs. Further, we show that there exists a threshold

such that the double pass is faster than the single pass for

, where

for most platforms.

A note on Neuberger's double pass algorithm

TL;DR

Abstract

A note on Neuberger's double pass algorithm

TL;DR

Abstract

Paper Structure

Table of Contents