Floating-point consistent cross-verification methodology for reproducible and interoperable DDA solvers with fair benchmarking

Clément Argentin; Patrick C. Chaumet; Michel Gross; Maxim A. Yurkin

Floating-point consistent cross-verification methodology for reproducible and interoperable DDA solvers with fair benchmarking

Clément Argentin, Patrick C. Chaumet, Michel Gross, Maxim A. Yurkin

Abstract

The discrete dipole approximation (DDA) is a widely used and versatile numerical method for solving electromagnetic scattering by arbitrarily shaped objects. Despite its popularity, quantitative comparisons between independent implementations remain challenging due to differences in linear-system conventions, solver settings, and default numerical parameters. In this work, we introduce a unified software-assisted methodology for cross-verification and benchmarking of three major open-source DDA solvers: DDSCAT, ADDA, and IFDDA. We demonstrate how machine-precision agreement can be achieved across implementations by aligning all free parameters and provide practical equivalence tables enabling reproducible and interoperable simulations. Using this methodology, we perform systematic CPU and GPU performance comparisons covering OpenMP, MPI, and CUDA/OpenCL parallelization. Beyond benchmarking, our approach serves as a practical guide for configuring consistent DDA simulations and for understanding how precision, solver choice, and hardware architecture affect runtime, scalability, and accuracy in computational light-scattering studies. The software package also supports regression testing and bitwise reproducibility verification for future code releases.

Floating-point consistent cross-verification methodology for reproducible and interoperable DDA solvers with fair benchmarking

Abstract

Paper Structure (34 sections, 37 equations, 4 figures, 10 tables)

This paper contains 34 sections, 37 equations, 4 figures, 10 tables.

Introduction
DDA implementations and workflows
Accuracy, interoperability, and guidelines
Understanding accuracy in DDA simulations
Different linear systems (ADDA, IFDDA, DDSCAT)
Standard DDA (polarization form, e.g., DDSCAT)
Symmetrized-form (change of variable, e.g., ADDA)
Internal field form (e.g., IFDDA)
Relationship between $\mathbf{A}^x$, $\mathbf{A}^E$, $\mathbf{A}^p$
Impact of free parameters on simulation agreement
Default parameters and unified command-line setup
Performance comparison: CPU benchmark
Computational Setup
Time Performance
Precision effects
...and 19 more sections

Figures (4)

Figure 1: Wall-clock time (top row) and speedup relative to the single-core reference (bottom row), for grid sizes $n_x=150$ (left) and $n_x=250$ (right) on the AMD EPYC 9654 node. Log-log scale is used, and the dashed curves indicate ideal linear scaling. Error bars, denoting the sample standard deviation, are shown when it is larger than 5% of the value.
Figure 2: Wall-clock time (top row) and speedup relative to the single-core reference (bottom row), for grid sizes $n_x=150$ (left) and $n_x=250$ (right) on the Intel Core Ultra 7 165H processor. Log-log scale is used, and the dashed curves indicate ideal linear scaling. Error bars, denoting the sample standard deviation, are shown when it is larger than 5% of the value.
Figure 3: GPU timings for DDA codes. Bars are grouped by code and stacked to show FFT time, solver time, and $1$-core wall-time. Horizontal black lines indicate $10$-core wall-times for IFDDA (DP/SP). Grid size $n_x=150$ (top row); $n_x=250$ (bottom row).
Figure E.1: GPU timings for ADDA and ADDA OCL_BLAS mode. Bars are grouped by mode and solver type and stacked to show FFT time, solver time, and 1-core wall-time. Grid size $n_x=150$ (top row); $n_x=250$ (bottom row). Blue-column results (BiCGStab) are the same as in Fig. \ref{['fig:fig3_dda_runtime_gpu']}.

Floating-point consistent cross-verification methodology for reproducible and interoperable DDA solvers with fair benchmarking

Abstract

Floating-point consistent cross-verification methodology for reproducible and interoperable DDA solvers with fair benchmarking

Authors

Abstract

Table of Contents

Figures (4)