Table of Contents
Fetching ...

Multi-GPU fast Fourier transforms in MATLAB (for large-scale phase-field crystal simulations)

Maik Punke, Marco Salvalaglio

Abstract

We present a MATLAB-based framework for two- and three-dimensional fast Fourier transforms on multiple GPUs for large-scale numerical simulations using the pseudo-spectral Fourier method. The software implements two complementary multi-GPU strategies that overcome single-GPU memory limitations and accelerate spectral solvers. This approach is motivated by and applied to phase-field crystal (PFC) models, which are governed by tenth-order partial differential equations, require fine spatial resolution, and are typically formulated in periodic domains. Our resulting numerical framework achieves significant speedups, approximately sixfold for standard PFC simulations and up to sixtyfold for multiphysics extensions, compared to a purely CPU-based implementation running on hundreds of cores.

Multi-GPU fast Fourier transforms in MATLAB (for large-scale phase-field crystal simulations)

Abstract

We present a MATLAB-based framework for two- and three-dimensional fast Fourier transforms on multiple GPUs for large-scale numerical simulations using the pseudo-spectral Fourier method. The software implements two complementary multi-GPU strategies that overcome single-GPU memory limitations and accelerate spectral solvers. This approach is motivated by and applied to phase-field crystal (PFC) models, which are governed by tenth-order partial differential equations, require fine spatial resolution, and are typically formulated in periodic domains. Our resulting numerical framework achieves significant speedups, approximately sixfold for standard PFC simulations and up to sixtyfold for multiphysics extensions, compared to a purely CPU-based implementation running on hundreds of cores.

Paper Structure

This paper contains 8 sections, 8 equations, 2 figures.

Figures (2)

  • Figure 1: (a) Schematic of the multi-GPU FFT algorithm based on slab decomposition for a three-dimensional array. The data are decomposed along the $z$-direction, followed by local two-dimensional FFTs, peer-to-peer communication, and a final one-dimensional FFT (upper panel). Relative runtimes for $1000$ time steps are shown, normalized by CPU execution time. Speedups of up to a factor of six are observed, with optimal performance on a single GPU for $750^3$ and multi-GPU execution required for larger domains due to memory constraints. An array of size $1400^3$ fits only on four H100 GPUs and reaches approximately 17 % of the CPU runtime (lower panel). (b) Decomposition of a pseudo-spectral multiphysics PFC solver (e.g., hydrodynamic PFC) across four GPUs (upper panel). Relative runtimes of the hydrodynamic PFC solver are shown as a function of problem size, demonstrating that multi-GPU execution enables simulations up to a problem size of $900^3$, which is infeasible on a single GPU. Compared to a CPU implementation, speedups of up to 60$\times$ are achieved (lower panel).
  • Figure 2: Representative large-scale PFC benchmark problems for multi-GPU FFT algorithms: (a) Dendritic solidification (underlying triangular crystal symmetry) using the multi-GPU single-FFT implementation (2D example; computational domain of size $5\cdot 10^4\times 5\cdot 10^4$, corresponding to $2.5 \,\mu\mathrm{m}\times 2.5 \,\,\mu\mathrm{m}$ when assuming a lattice constant of $4\AA$ for aluminum). The density field $\psi$ along with a close-up is shown. White lines delineate patches, as the array exceeds single-plot size limits. (b) Polycrystalline coarsening of an FCC crystal structure using the multi-GPU hydrodynamic PFC solver. Visualized are the density field $\psi$ (including a magnified view) and the velocity components $\mathbf{v}_1$, $\mathbf{v}_2$, $\mathbf{v}_3$. A grid of $1400\times 1400\times 1400$ is used which corresponds to a box size of $40\,\mathrm{nm}\times 40\,\mathrm{nm}\times 40\,\mathrm{nm}$. The material and model parameters are documented in the repository accompanying this work.