Table of Contents
Fetching ...

Evaluation of disconnected quark loops for hadron structure using GPUs

C. Alexandrou, M. Constantinou, V. Drach, K. Hadjiyiannakou, K. Jansen, G. Koutsou, A. Strelchenko, A. Vaquero

TL;DR

The paper investigates the GPU-accelerated calculation of disconnected quark loops in lattice QCD using twisted-mass fermions with $N_f=2+1+1$, benchmarking variance-reduction methods such as the stochastic estimator, Truncated Solver Method (TSM), one-end trick, time-dilution, and Hopping Parameter Expansion (HPE) on a $32^3\times64$ lattice. It contrasts approaches for key observables like the isoscalar nucleon axial charge $g_A$ and sigma-terms, detailing parameter tuning, convergence behavior, and the relative efficiency of each method. The results show that the one-end trick, particularly when combined with TSM, offers the best efficiency for light and strange loops, while TSM's advantages decrease for charm unless applied to specific observables (e.g., $g_A^c$); time-dilution with HPE can be competitive in heavier-mass scenarios. By integrating plateau and summation analyses and enabling loops at all insertion times, the study provides practical guidance for high-precision flavor-singlet observables and demonstrates scalable GPU-enabled workflows for nucleon-structure studies.

Abstract

A number of stochastic methods developed for the calculation of fermion loops are investigated and compared, in particular with respect to their efficiency when implemented on Graphics Processing Units (GPUs). We assess the performance of the various methods by studying the convergence and statistical accuracy obtained for observables that require a large number of stochastic noise vectors, such as the isoscalar nucleon axial charge. The various methods are also examined for the evaluation of sigma-terms where noise reduction techniques specific to the twisted mass formulation can be utilized thus reducing the required number of stochastic noise vectors.

Evaluation of disconnected quark loops for hadron structure using GPUs

TL;DR

The paper investigates the GPU-accelerated calculation of disconnected quark loops in lattice QCD using twisted-mass fermions with , benchmarking variance-reduction methods such as the stochastic estimator, Truncated Solver Method (TSM), one-end trick, time-dilution, and Hopping Parameter Expansion (HPE) on a lattice. It contrasts approaches for key observables like the isoscalar nucleon axial charge and sigma-terms, detailing parameter tuning, convergence behavior, and the relative efficiency of each method. The results show that the one-end trick, particularly when combined with TSM, offers the best efficiency for light and strange loops, while TSM's advantages decrease for charm unless applied to specific observables (e.g., ); time-dilution with HPE can be competitive in heavier-mass scenarios. By integrating plateau and summation analyses and enabling loops at all insertion times, the study provides practical guidance for high-precision flavor-singlet observables and demonstrates scalable GPU-enabled workflows for nucleon-structure studies.

Abstract

A number of stochastic methods developed for the calculation of fermion loops are investigated and compared, in particular with respect to their efficiency when implemented on Graphics Processing Units (GPUs). We assess the performance of the various methods by studying the convergence and statistical accuracy obtained for observables that require a large number of stochastic noise vectors, such as the isoscalar nucleon axial charge. The various methods are also examined for the evaluation of sigma-terms where noise reduction techniques specific to the twisted mass formulation can be utilized thus reducing the required number of stochastic noise vectors.

Paper Structure

This paper contains 20 sections, 23 equations, 16 figures, 3 tables.

Figures (16)

  • Figure 1: Tuning of $N_{\rm HP}$ and $N_{\rm LP}$ entering the TSM using the B55.32 ensemble on 50 configurations for the nucleon matrix element of the operator $i\bar{\psi}\gamma_3 D_3\psi$. The insertion time is fixed at $t_{\rm ins}= 8a$ and sink time at $t_{\rm s} = 16a$. The error is shown versus $N_{\rm LP}$ for different values of $N_{\rm HP}$ marked by the different plotting symbols as indicated in the legend.
  • Figure 2: The error versus $N_{\rm LP}$ fixing $N_{\rm HP}=24$ for $\sigma_{\pi N}$ and the isoscalar $g_A$ for 56400 measurements.
  • Figure 3: The error (upper) and the mean value (lower) versus $N_{\rm HP}$ fixing $N_{\rm LP}=300$ for $g_A^s$ using 448 configurations.
  • Figure 4: Strong scaling of the multi-GPU conjugate-gradient solver using the B55.32 ensemble and either 64-bit (double), 32-bit (single) or 16-bit (half) floating point precision.
  • Figure 5: Weak scaling of the multi-GPU conjugate-gradient solver for a local volume $V=24^4$, using the same notation as in Fig. \ref{['PerfPlots2']}
  • ...and 11 more figures