Table of Contents
Fetching ...

Optimizing FDTD Solvers for Electromagnetics: A Compiler-Guided Approach with High-Level Tensor Abstractions

Yifei He, Måns I. Andersson, Stefano Markidis

TL;DR

The Finite Difference Time Domain (FDTD) method is widely used for solving Maxwell's equations but suffers from memory-bound bottlenecks and portability challenges across modern CPUs. The authors introduce an end-to-end domain-specific compiler built on MLIR/LLVM that expresses FDTD kernels as 3D tensor abstractions, enabling automatic optimizations such as loop tiling, fusion, and vectorization while preserving hardware-agnostic semantics; a curl_step operator captures curl updates for both $\mathbf{H}$ and $\mathbf{E}$ fields within a unified framework. The compilation pipeline progressively lowers high-level tensor operations to machine code for Intel, AMD, and ARM CPUs, achieving up to 10× speedups over a NumPy baseline and demonstrating improved data locality and cache utilization through tiling and vectorization. This work highlights the potential of MLIR-based, domain-specific compilation to deliver portable, high-performance HPC kernels for time-domain electromagnetics, with planned extensions to GPU backends and automated tuning to optimize performance across heterogeneous architectures.

Abstract

The Finite Difference Time Domain (FDTD) method is a widely used numerical technique for solving Maxwell's equations, particularly in computational electromagnetics and photonics. It enables accurate modeling of wave propagation in complex media and structures but comes with significant computational challenges. Traditional FDTD implementations rely on handwritten, platform-specific code that optimizes certain kernels while underperforming in others. The lack of portability increases development overhead and creates performance bottlenecks, limiting scalability across modern hardware architectures. To address these challenges, we introduce an end-to-end domain-specific compiler based on the MLIR/LLVM infrastructure for FDTD simulations. Our approach generates efficient and portable code optimized for diverse hardware platforms.We implement the three-dimensional FDTD kernel as operations on a 3D tensor abstraction with explicit computational semantics. High-level optimizations such as loop tiling, fusion, and vectorization are automatically applied by the compiler. We evaluate our customized code generation pipeline on Intel, AMD, and ARM platforms, achieving up to $10\times$ speedup over baseline Python implementation using NumPy.

Optimizing FDTD Solvers for Electromagnetics: A Compiler-Guided Approach with High-Level Tensor Abstractions

TL;DR

The Finite Difference Time Domain (FDTD) method is widely used for solving Maxwell's equations but suffers from memory-bound bottlenecks and portability challenges across modern CPUs. The authors introduce an end-to-end domain-specific compiler built on MLIR/LLVM that expresses FDTD kernels as 3D tensor abstractions, enabling automatic optimizations such as loop tiling, fusion, and vectorization while preserving hardware-agnostic semantics; a curl_step operator captures curl updates for both and fields within a unified framework. The compilation pipeline progressively lowers high-level tensor operations to machine code for Intel, AMD, and ARM CPUs, achieving up to 10× speedups over a NumPy baseline and demonstrating improved data locality and cache utilization through tiling and vectorization. This work highlights the potential of MLIR-based, domain-specific compilation to deliver portable, high-performance HPC kernels for time-domain electromagnetics, with planned extensions to GPU backends and automated tuning to optimize performance across heterogeneous architectures.

Abstract

The Finite Difference Time Domain (FDTD) method is a widely used numerical technique for solving Maxwell's equations, particularly in computational electromagnetics and photonics. It enables accurate modeling of wave propagation in complex media and structures but comes with significant computational challenges. Traditional FDTD implementations rely on handwritten, platform-specific code that optimizes certain kernels while underperforming in others. The lack of portability increases development overhead and creates performance bottlenecks, limiting scalability across modern hardware architectures. To address these challenges, we introduce an end-to-end domain-specific compiler based on the MLIR/LLVM infrastructure for FDTD simulations. Our approach generates efficient and portable code optimized for diverse hardware platforms.We implement the three-dimensional FDTD kernel as operations on a 3D tensor abstraction with explicit computational semantics. High-level optimizations such as loop tiling, fusion, and vectorization are automatically applied by the compiler. We evaluate our customized code generation pipeline on Intel, AMD, and ARM platforms, achieving up to speedup over baseline Python implementation using NumPy.

Paper Structure

This paper contains 17 sections, 2 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: (a): Diagram illustrating the overall domain-specific compilation and code-generation pipeline for FDTD. (b): The $E_x$ field in a cavity domain with a random initial state, showing only positive values after thresholding. Note the enforced PEC boundary conditions.
  • Figure 2: Comparison of the full FDTD algorithm with naive and NumPy-based implementations of the curl operator for Hx.
  • Figure 3: MLIR example code throughout the optimization pipeline (abridged for clarity and space). (A–B): Input tensor payload IR and transform IR. (C–D): IR illustrating transformations; (C) before bufferization, (D) after bufferization, showing tensors replaced with memrefs.
  • Figure 4: (A–C) Performance comparison of single-threaded FDTD versus NumPy (baseline: NumPy double precision) on Intel, AMD, and ARM CPUs. (D) Performance breakdown for different optimization combinations.