Table of Contents
Fetching ...

Flexible Multi-Dimensional FFTs for Plane Wave Density Functional Theory Codes

Doru Thom Popovici, Mauro del Ben, Osni Marques, Andrew Canning

TL;DR

The paper addresses the need for flexible, distributed multi-dimensional FFTs tailored to plane-wave density functional theory codes that operate on batched spherical data. It introduces FFTB, a modular framework with a processing-grid API that supports both cuboid and sphere-based data, enabling batched and non-batched transforms on CPU and GPU backends. The approach fuses local transforms with data movement through a programmable pipeline, achieving superior scalability on HPC systems and reducing redundant padding via staged padding strategies. Experimental results on GPU-accelerated systems demonstrate strong scaling and the practical benefits of batching for plane-wave FFTs, highlighting FFTB’s potential to accelerate plane-wave DFT workflows across diverse architectures. The work offers a path toward integrating flexible FFTs into existing DFT codes and extending support to future HPC platforms, with open-source release planned.

Abstract

Multi-dimensional Fourier transforms are key mathematical building blocks that appear in a wide range of applications from materials science, physics, chemistry and even machine learning. Over the past years, a multitude of software packages targeting distributed multi-dimensional Fourier transforms have been developed. Most variants attempt to offer efficient implementations for single transforms applied on data mapped onto rectangular grids. However, not all scientific applications conform to this pattern, i.e. plane wave Density Functional Theory codes require multi-dimensional Fourier transforms applied on data represented as batches of spheres. Typically, the implementations for this use case are hand-coded and tailored for the requirements of each application. In this work, we present the Fastest Fourier Transform from Berkeley (FFTB) a distributed framework that offers flexible implementations for both regular/non-regular data grids and batched/non-batched transforms. We provide a flexible implementations with a user-friendly API that captures most of the use cases. Furthermore, we provide implementations for both CPU and GPU platforms, showing that our approach offers improved execution time and scalability on the HP Cray EX supercomputer. In addition, we outline the need for flexible implementations for different use cases of the software package.

Flexible Multi-Dimensional FFTs for Plane Wave Density Functional Theory Codes

TL;DR

The paper addresses the need for flexible, distributed multi-dimensional FFTs tailored to plane-wave density functional theory codes that operate on batched spherical data. It introduces FFTB, a modular framework with a processing-grid API that supports both cuboid and sphere-based data, enabling batched and non-batched transforms on CPU and GPU backends. The approach fuses local transforms with data movement through a programmable pipeline, achieving superior scalability on HPC systems and reducing redundant padding via staged padding strategies. Experimental results on GPU-accelerated systems demonstrate strong scaling and the practical benefits of batching for plane-wave FFTs, highlighting FFTB’s potential to accelerate plane-wave DFT workflows across diverse architectures. The work offers a path toward integrating flexible FFTs into existing DFT codes and extending support to future HPC platforms, with open-source release planned.

Abstract

Multi-dimensional Fourier transforms are key mathematical building blocks that appear in a wide range of applications from materials science, physics, chemistry and even machine learning. Over the past years, a multitude of software packages targeting distributed multi-dimensional Fourier transforms have been developed. Most variants attempt to offer efficient implementations for single transforms applied on data mapped onto rectangular grids. However, not all scientific applications conform to this pattern, i.e. plane wave Density Functional Theory codes require multi-dimensional Fourier transforms applied on data represented as batches of spheres. Typically, the implementations for this use case are hand-coded and tailored for the requirements of each application. In this work, we present the Fastest Fourier Transform from Berkeley (FFTB) a distributed framework that offers flexible implementations for both regular/non-regular data grids and batched/non-batched transforms. We provide a flexible implementations with a user-friendly API that captures most of the use cases. Furthermore, we provide implementations for both CPU and GPU platforms, showing that our approach offers improved execution time and scalability on the HP Cray EX supercomputer. In addition, we outline the need for flexible implementations for different use cases of the software package.
Paper Structure (13 sections, 10 equations, 9 figures, 1 table)

This paper contains 13 sections, 10 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Two algorithms used to compute a three dimensional Fourier transform. The top algorithm decomposes the computation as a batch of 2D transforms applied in the $xy$-plane and a batch of 1D transforms applied in the $z$ dimension. The bottom algorithm, applies the three-dimensional Fourier transform as three groups of 1D Fourier transforms applied in each dimension of the input three dimensional tensors.
  • Figure 2: Each wavefunction is decomposed using a Fourier series expansion and only the complex coefficients within a cut-off sphere are kept. The 3D Fourier computation requires the data to be on a cuboid grid. As such, the data can be padded with zeros to a cube typically of width twice the diameter of the sphere.
  • Figure 3: The padding operation can also be split by dimensions. The padding is done in the $x$-dimension first, followed by the $y$-dimension and $z$-dimension. The 3D Fourier transform has a similar decomposition, so after each padding operation, the 1D transform can be immediately applied. Exploiting the structure of the data will reduce the amount of data that is being communicated and computed upon.
  • Figure 4: The structure of FFTB. The API (green block) contains the main functionalities for describing distributed Fourier transforms. The intermediate block (yellow block) creates and links the stages of the Fourier transform based on the distribution of the inputs/outputs. The main stages are either local computation stages (red block) or data movement stages (orange block).
  • Figure 5: A 3D Fourier transform applied on an input tensor distributed in the $x$-dimension. The result of the Fourier transform is a tensor that is distributed in the $z$-dimension.
  • ...and 4 more figures