Table of Contents
Fetching ...

From 8 Seconds to 370ms: Kernel-Fused SAR Imaging on Apple Silicon via Single-Dispatch FFT Pipelines

Mohamed Amine Bergach

Abstract

We present the first kernel-fused SAR Range Doppler pipeline on any GPU platform. By fusing FFT, matched-filter multiply, and IFFT into a single Metal compute dispatch -- keeping all intermediate data in 32\,KiB on-chip memory -- we process a $4096\!\times\!4096$ complex SAR scene in \textbf{370\,ms} on an Apple M1 GPU, a \textbf{22$\times$} speedup over the multi-dispatch baseline (8.16\,s). We further report the first FFT to exploit Apple's \texttt{simdgroup\_matrix} 8$\times$8 hardware MMA, enabled by an in-place Cooley--Tukey decimation-in-frequency formulation that halves the memory footprint versus Stockham. Radar image quality is preserved: all five point targets show 0.0\,dB SNR deviation from the unfused FP32 reference.

From 8 Seconds to 370ms: Kernel-Fused SAR Imaging on Apple Silicon via Single-Dispatch FFT Pipelines

Abstract

We present the first kernel-fused SAR Range Doppler pipeline on any GPU platform. By fusing FFT, matched-filter multiply, and IFFT into a single Metal compute dispatch -- keeping all intermediate data in 32\,KiB on-chip memory -- we process a complex SAR scene in \textbf{370\,ms} on an Apple M1 GPU, a \textbf{22} speedup over the multi-dispatch baseline (8.16\,s). We further report the first FFT to exploit Apple's \texttt{simdgroup\_matrix} 88 hardware MMA, enabled by an in-place Cooley--Tukey decimation-in-frequency formulation that halves the memory footprint versus Stockham. Radar image quality is preserved: all five point targets show 0.0\,dB SNR deviation from the unfused FP32 reference.

Paper Structure

This paper contains 19 sections, 1 equation, 1 figure, 5 tables.

Figures (1)

  • Figure 1: Unfused (top) vs. fused (bottom) range compression. Red: device-memory I/O; blue: compute kernels; green: fused kernel. Dashed boxes are separate GPU dispatches. Dashed arrows are redundant device-memory round-trips eliminated by fusion.