Sampling-based Distributed Training with Message Passing Neural Network

Priyesh Kakka; Sheel Nidhan; Rishikesh Ranade; Jay Pathak; Jonathan F. MacArt

Sampling-based Distributed Training with Message Passing Neural Network

Priyesh Kakka, Sheel Nidhan, Rishikesh Ranade, Jay Pathak, Jonathan F. MacArt

TL;DR

This work tackles the memory and scalability bottleneck of edge-based graph neural networks for PDE surrogates by introducing DS-MPNN, a domain-decomposition and Nyström-inspired sampling framework that distributes MPNN training across multiple GPUs. By constructing graph kernels around sampled centers with radius-based message passing and employing overlap regions for inter-GPU communication, DS-MPNN scales to roughly $O(10^5)$ nodes while maintaining accuracy comparable to a single-GPU MPNN and outperforming node-based GCN baselines. Across Darcy flow, AirfRANS, and 3D step flow experiments, DS-MPNN demonstrates robust performance and significant gains in training and inference speed, confirming the practicality of edge-based PDE surrogates on large, unstructured graphs. The approach lays groundwork for integrating advanced domain partitioning (e.g., METIS) and modern distributed graph libraries to further scale PDE-informed learning.

Abstract

In this study, we introduce a domain-decomposition-based distributed training and inference approach for message-passing neural networks (MPNN). Our objective is to address the challenge of scaling edge-based graph neural networks as the number of nodes increases. Through our distributed training approach, coupled with Nyström-approximation sampling techniques, we present a scalable graph neural network, referred to as DS-MPNN (D and S standing for distributed and sampled, respectively), capable of scaling up to $O(10^5)$ nodes. We validate our sampling and distributed training approach on two cases: (a) a Darcy flow dataset and (b) steady RANS simulations of 2-D airfoils, providing comparisons with both single-GPU implementation and node-based graph convolution networks (GCNs). The DS-MPNN model demonstrates comparable accuracy to single-GPU implementation, can accommodate a significantly larger number of nodes compared to the single-GPU variant (S-MPNN), and significantly outperforms the node-based GCN.

Sampling-based Distributed Training with Message Passing Neural Network

TL;DR

nodes while maintaining accuracy comparable to a single-GPU MPNN and outperforming node-based GCN baselines. Across Darcy flow, AirfRANS, and 3D step flow experiments, DS-MPNN demonstrates robust performance and significant gains in training and inference speed, confirming the practicality of edge-based PDE surrogates on large, unstructured graphs. The approach lays groundwork for integrating advanced domain partitioning (e.g., METIS) and modern distributed graph libraries to further scale PDE-informed learning.

Abstract

nodes. We validate our sampling and distributed training approach on two cases: (a) a Darcy flow dataset and (b) steady RANS simulations of 2-D airfoils, providing comparisons with both single-GPU implementation and node-based graph convolution networks (GCNs). The DS-MPNN model demonstrates comparable accuracy to single-GPU implementation, can accommodate a significantly larger number of nodes compared to the single-GPU variant (S-MPNN), and significantly outperforms the node-based GCN.

Paper Structure (18 sections, 8 equations, 11 figures, 8 tables, 1 algorithm)

This paper contains 18 sections, 8 equations, 11 figures, 8 tables, 1 algorithm.

Introduction and Related Work
Model Description
Graph construction
Model algorithm
Methodology
Experiments
Darcy Flow (Structured Data)
AirfRANS (Unstructured Data)
Low-fidelity AirfRANS dataset
High-fidelity AirfRANS dataset
Three-dimensional step flow dataset
Conclusions
Ablation Studies on Scalability and Communication Overhead
Darcy Flow Equations
Darcy Flow Visualizations
...and 3 more sections

Figures (11)

Figure 1: Graph kernel $\mathcal{G}$ construction for an individual node in the low-fidelity AirfRANS dataset. Yellow arrows indicate node and edge sampling; various colors denote distinct computational domains on separate GPUs; and $\Omega_{r}$ represents the distributed domain, encompassing an overlap from neighboring domains with a length of kernel radius $r$.
Figure 2: Methodology representing graph kernels $\mathcal{G}$ from separate distributed domains of an AirfRANS dataset being processed on four individual GPUs. Inter GPU communication represents the exchange of information between the neighboring domains through overlap regions during each hop. After $h$ radius hops, loss $L$ is calculated on the interior points and aggregated over all GPUs to update the neural network parameters.
Figure 3: Training loss vs. epochs for different schedulers.
Figure 4: Comparison among GCN, S-MPNN, DS-MPNN2 and DS-MPNN4 for a test sample from low-fidelity AirfRANS dataset.
Figure 5: Comparison of $x$--velocity (velocity in the direction of flow) between single GPU (S-MPNN) and four-GPU (DS-MPNN4) for a test sample of the 3-D step dataset.
...and 6 more figures

Sampling-based Distributed Training with Message Passing Neural Network

TL;DR

Abstract

Sampling-based Distributed Training with Message Passing Neural Network

Authors

TL;DR

Abstract

Table of Contents

Figures (11)