mLR: Scalable Laminography Reconstruction based on Memoization
Bin Ma, Viktor Nikitin, Xi Wang, Tekin Bicer, Dong Li
TL;DR
This work addresses the prohibitive compute time and memory demands of ADMM-FFT in laminography by introducing mLR, a memoization-driven acceleration framework. Key ideas include replacing expensive FFT calls with cached results, a CNN-based encoder to map FFT inputs to compact keys, and a distributed memoization layer backed by Faiss and Redis, complemented by ADMM-Offload to store large variables on NVMe SSDs. The approach achieves substantial scalability and performance gains, demonstrated by a $52.8\%$ average speedup (up to $65.4\%$) on a $2K\times2K\times2K$ laminate, enabling previously memory-limited laminography reconstructions. The solution effectively balances memory savings, computation, and convergence quality, providing a practical path to large-volume 3D imaging on memory-constrained HPC systems, with code open-sourced for reproducibility.
Abstract
ADMM-FFT is an iterative method with high reconstruction accuracy for laminography but suffers from excessive computation time and large memory consumption. We introduce mLR, which employs memoization to replace the time-consuming Fast Fourier Transform (FFT) operations based on an unique observation that similar FFT operations appear in iterations of ADMM-FFT. We introduce a series of techniques to make the application of memoization to ADMM-FFT performance-beneficial and scalable. We also introduce variable offloading to save CPU memory and scale ADMM-FFT across GPUs within and across nodes. Using mLR, we are able to scale ADMM-FFT on an input problem of 2Kx2Kx2K, which is the largest input problem laminography reconstruction has ever worked on with the ADMM-FFT solution on limited memory; mLR brings 52.8% performance improvement on average (up to 65.4%), compared to the original ADMM-FFT.
