Table of Contents
Fetching ...

A note on Neuberger's double pass algorithm

Ting-Wai Chiu, Tung-Han Hsieh

TL;DR

The paper analyzes Neuberger's double-pass algorithm for computing $R(H^2)\,Y$ to approximate $(H^2)^{-1/2}Y$ in the overlap Dirac operator framework. It proves that the double pass has a flop count $F_2$ essentially independent of the degree $n$ of the rational approximation when the lattice volume is large, enabling high-precision results with large $n$ at negligible extra cost. It identifies thresholds $n_T$ (about $12$–$25$ on common platforms) where the double pass becomes faster than the single pass, and a larger $n_F$ (≈$59$ in their tests) where the double pass is advantageous in CPU time. Numerical tests across architectures corroborate the theory, showing substantial speedups around 25–31% at moderate $n$, and demonstrating that the double pass preserves chiral symmetry without significant cost as $n$ grows, making it a favorable method for lattice QCD computations that require accurate sign-function evaluations.

Abstract

We analyze Neuberger's double pass algorithm for the matrix-vector multiplication R(H).Y (where R(H) is (n-1,n)-th degree rational polynomial of positive definite operator H), and show that the number of floating point operations is independent of the degree n, provided that the number of sites is much larger than the number of iterations in the conjugate gradient. This implies that the matrix-vector product $ (H)^{-1/2} Y \simeq R^{(n-1,n)}(H) \cdot Y $ can be approximated to very high precision with sufficiently large n, without noticeably extra costs. Further, we show that there exists a threshold $ n_T $ such that the double pass is faster than the single pass for $ n > n_T $, where $ n_T \simeq 12 - 25 $ for most platforms.

A note on Neuberger's double pass algorithm

TL;DR

The paper analyzes Neuberger's double-pass algorithm for computing to approximate in the overlap Dirac operator framework. It proves that the double pass has a flop count essentially independent of the degree of the rational approximation when the lattice volume is large, enabling high-precision results with large at negligible extra cost. It identifies thresholds (about on common platforms) where the double pass becomes faster than the single pass, and a larger (≈ in their tests) where the double pass is advantageous in CPU time. Numerical tests across architectures corroborate the theory, showing substantial speedups around 25–31% at moderate , and demonstrating that the double pass preserves chiral symmetry without significant cost as grows, making it a favorable method for lattice QCD computations that require accurate sign-function evaluations.

Abstract

We analyze Neuberger's double pass algorithm for the matrix-vector multiplication R(H).Y (where R(H) is (n-1,n)-th degree rational polynomial of positive definite operator H), and show that the number of floating point operations is independent of the degree n, provided that the number of sites is much larger than the number of iterations in the conjugate gradient. This implies that the matrix-vector product can be approximated to very high precision with sufficiently large n, without noticeably extra costs. Further, we show that there exists a threshold such that the double pass is faster than the single pass for , where for most platforms.

Paper Structure

This paper contains 5 sections, 41 equations, 6 tables.