Table of Contents
Fetching ...

Floating-point consistent cross-verification methodology for reproducible and interoperable DDA solvers with fair benchmarking

Clément Argentin, Patrick C. Chaumet, Michel Gross, Maxim A. Yurkin

Abstract

The discrete dipole approximation (DDA) is a widely used and versatile numerical method for solving electromagnetic scattering by arbitrarily shaped objects. Despite its popularity, quantitative comparisons between independent implementations remain challenging due to differences in linear-system conventions, solver settings, and default numerical parameters. In this work, we introduce a unified software-assisted methodology for cross-verification and benchmarking of three major open-source DDA solvers: DDSCAT, ADDA, and IFDDA. We demonstrate how machine-precision agreement can be achieved across implementations by aligning all free parameters and provide practical equivalence tables enabling reproducible and interoperable simulations. Using this methodology, we perform systematic CPU and GPU performance comparisons covering OpenMP, MPI, and CUDA/OpenCL parallelization. Beyond benchmarking, our approach serves as a practical guide for configuring consistent DDA simulations and for understanding how precision, solver choice, and hardware architecture affect runtime, scalability, and accuracy in computational light-scattering studies. The software package also supports regression testing and bitwise reproducibility verification for future code releases.

Floating-point consistent cross-verification methodology for reproducible and interoperable DDA solvers with fair benchmarking

Abstract

The discrete dipole approximation (DDA) is a widely used and versatile numerical method for solving electromagnetic scattering by arbitrarily shaped objects. Despite its popularity, quantitative comparisons between independent implementations remain challenging due to differences in linear-system conventions, solver settings, and default numerical parameters. In this work, we introduce a unified software-assisted methodology for cross-verification and benchmarking of three major open-source DDA solvers: DDSCAT, ADDA, and IFDDA. We demonstrate how machine-precision agreement can be achieved across implementations by aligning all free parameters and provide practical equivalence tables enabling reproducible and interoperable simulations. Using this methodology, we perform systematic CPU and GPU performance comparisons covering OpenMP, MPI, and CUDA/OpenCL parallelization. Beyond benchmarking, our approach serves as a practical guide for configuring consistent DDA simulations and for understanding how precision, solver choice, and hardware architecture affect runtime, scalability, and accuracy in computational light-scattering studies. The software package also supports regression testing and bitwise reproducibility verification for future code releases.
Paper Structure (34 sections, 37 equations, 4 figures, 10 tables)

This paper contains 34 sections, 37 equations, 4 figures, 10 tables.

Figures (4)

  • Figure 1: Wall-clock time (top row) and speedup relative to the single-core reference (bottom row), for grid sizes $n_x=150$ (left) and $n_x=250$ (right) on the AMD EPYC 9654 node. Log-log scale is used, and the dashed curves indicate ideal linear scaling. Error bars, denoting the sample standard deviation, are shown when it is larger than 5% of the value.
  • Figure 2: Wall-clock time (top row) and speedup relative to the single-core reference (bottom row), for grid sizes $n_x=150$ (left) and $n_x=250$ (right) on the Intel Core Ultra 7 165H processor. Log-log scale is used, and the dashed curves indicate ideal linear scaling. Error bars, denoting the sample standard deviation, are shown when it is larger than 5% of the value.
  • Figure 3: GPU timings for DDA codes. Bars are grouped by code and stacked to show FFT time, solver time, and $1$-core wall-time. Horizontal black lines indicate $10$-core wall-times for IFDDA (DP/SP). Grid size $n_x=150$ (top row); $n_x=250$ (bottom row).
  • Figure E.1: GPU timings for ADDA and ADDA OCL_BLAS mode. Bars are grouped by mode and solver type and stacked to show FFT time, solver time, and 1-core wall-time. Grid size $n_x=150$ (top row); $n_x=250$ (bottom row). Blue-column results (BiCGStab) are the same as in Fig. \ref{['fig:fig3_dda_runtime_gpu']}.