SAGIPS: A Scalable Asynchronous Generative Inverse Problem Solver
Daniel Lersch, Malachi Schram, Zhenyu Dai, Kishansingh Rajput, Xingfu Wu, N. Sato, J. Taylor Childers
TL;DR
SAGIPS reframes large-scale inverse problem solving as a scalable, asynchronous generative workflow that runs across multiple GPUs. The approach uses a GAN-based optimizer and a differentiable pipeline environment, with generator gradients exchanged asynchronously via ring-all-reduce, augmented by grouping and remote memory access to mitigate communication bottlenecks. Empirical results on a Polaris HPC setup demonstrate near-linear weak scaling and convergence quality comparable to traditional methods, with ensembles and distributed training further enhancing robustness and throughput. The work offers a practical pathway to leveraging GANs for complex inverse problems at scale, informing future research on gradient fusion and alternative distributed topologies in HPC settings.
Abstract
Large scale, inverse problem solving deep learning algorithms have become an essential part of modern research and industrial applications. The complexity of the underlying inverse problem often poses challenges to the algorithm and requires the proper utilization of high-performance computing systems. Most deep learning algorithms require, due to their design, custom parallelization techniques in order to be resource efficient while showing a reasonable convergence. In this paper we introduces a \underline{S}calable \underline{A}synchronous \underline{G}enerative workflow for solving \underline{I}nverse \underline{P}roblems \underline{S}olver (SAGIPS) on high-performance computing systems. We present a workflow that utilizes a parallelization approach where the gradients of the generator network are updated in an asynchronous ring-all-reduce fashion. Experiments with a scientific proxy application demonstrate that SAGIPS shows near linear weak scaling, together with a convergence quality that is comparable to traditional methods. The approach presented here allows leveraging GANs across multiple GPUs, promising advancements in solving complex inverse problems at scale.
