SAGIPS: A Scalable Asynchronous Generative Inverse Problem Solver

Daniel Lersch; Malachi Schram; Zhenyu Dai; Kishansingh Rajput; Xingfu Wu; N. Sato; J. Taylor Childers

SAGIPS: A Scalable Asynchronous Generative Inverse Problem Solver

Daniel Lersch, Malachi Schram, Zhenyu Dai, Kishansingh Rajput, Xingfu Wu, N. Sato, J. Taylor Childers

TL;DR

SAGIPS reframes large-scale inverse problem solving as a scalable, asynchronous generative workflow that runs across multiple GPUs. The approach uses a GAN-based optimizer and a differentiable pipeline environment, with generator gradients exchanged asynchronously via ring-all-reduce, augmented by grouping and remote memory access to mitigate communication bottlenecks. Empirical results on a Polaris HPC setup demonstrate near-linear weak scaling and convergence quality comparable to traditional methods, with ensembles and distributed training further enhancing robustness and throughput. The work offers a practical pathway to leveraging GANs for complex inverse problems at scale, informing future research on gradient fusion and alternative distributed topologies in HPC settings.

Abstract

Large scale, inverse problem solving deep learning algorithms have become an essential part of modern research and industrial applications. The complexity of the underlying inverse problem often poses challenges to the algorithm and requires the proper utilization of high-performance computing systems. Most deep learning algorithms require, due to their design, custom parallelization techniques in order to be resource efficient while showing a reasonable convergence. In this paper we introduces a \underline{S}calable \underline{A}synchronous \underline{G}enerative workflow for solving \underline{I}nverse \underline{P}roblems \underline{S}olver (SAGIPS) on high-performance computing systems. We present a workflow that utilizes a parallelization approach where the gradients of the generator network are updated in an asynchronous ring-all-reduce fashion. Experiments with a scientific proxy application demonstrate that SAGIPS shows near linear weak scaling, together with a convergence quality that is comparable to traditional methods. The approach presented here allows leveraging GANs across multiple GPUs, promising advancements in solving complex inverse problems at scale.

SAGIPS: A Scalable Asynchronous Generative Inverse Problem Solver

TL;DR

Abstract

Paper Structure (35 sections, 10 equations, 16 figures, 4 tables, 1 algorithm)

This paper contains 35 sections, 10 equations, 16 figures, 4 tables, 1 algorithm.

Introduction
The SAGIPS Workflow
The Optimizer
The Environment
Pipeline
Objective Function
Distributed Analysis
Software Environment and Hardware
GAN Optimizer
GAN in a Nutshell
Deep Learning Framework
Parallelization Strategy
Distributed Training of the GAN Workflow
Ensemble Analysis
Asynchronous Data Parallel Training with Overlap
...and 20 more sections

Figures (16)

Figure 1: Schematic representation of the SAGIPS workflow with all its modules and dependencies. The individual models are described in the text.
Figure 2: Schematic representation of training a GAN.
Figure 3: Distributing a common data set across multiple ranks. Each rank has its own copy of the data, but analyzes a fraction only (indicated by transparent rectangle).
Figure 4: Schematic representation of a ring-all-reduce communication between 12 ranks.
Figure 5: Schematic representation of a Remote Memory Access communication between two ranks $i$ and $i+1$.
...and 11 more figures

SAGIPS: A Scalable Asynchronous Generative Inverse Problem Solver

TL;DR

Abstract

SAGIPS: A Scalable Asynchronous Generative Inverse Problem Solver

Authors

TL;DR

Abstract

Table of Contents

Figures (16)