Table of Contents
Fetching ...

GPU Sharing with Triples Mode

Chansup Byun, Albert Reuther, LaToya Anderson, William Arcand, Bill Bergeron, David Bestor, Alexander Bonn, Daniel Burrill, Vijay Gadepally, Michael Houle, Matthew Hubbell, Hayden Jananthan, Michael Jones, Piotr Luszczek, Peter Michaleas, Lauren Milechin, Guillermo Morales, Julie Mullen, Andrew Prout, Antonio Rosa, Charles Yee, Jeremy Kepner

TL;DR

The paper addresses rising GPU demand in AI/ML workloads and proposes a GPU-sharing approach based on extending the triples mode within MIT Lincoln Laboratory's LLsub/LLMapReduce toolkit. It defines a node-based mapping across compute resources using the triple $(NNODE, NPPN, NTPP)$ and automatic script generation that assigns GPUs via CUDA_VISIBLE_DEVICES, without requiring scheduler changes. Experiments on MNIST/LeNet-4 and ImageNet/ResNet-18 show substantial throughput gains, up to about $10\times$ for MNIST and about $2.56\times$ for ImageNet under suitable configurations, highlighting memory considerations and scalability limits. The approach is practical to deploy in production HPC environments and can boost GPU utilization for parametric studies and multi-application workloads.

Abstract

There is a tremendous amount of interest in AI/ML technologies due to the proliferation of generative AI applications such as ChatGPT. This trend has significantly increased demand on GPUs, which are the workhorses for training AI models. Due to the high costs of GPUs and lacking supply, it has become of interest to optimize GPU usage in HPC centers. MIT Lincoln Laboratory Supercomputing Center (LLSC) has developed an easy-to-use GPU sharing feature supported by LLSC-developed tools including LLsub and LLMapReduce. This approach overcomes some of the limitations with the existing methods for GPU sharing. This allows users to apply GPU sharing whenever possible while they are developing their AI/ML models and/or doing parametric study on their AI models or executing other GPU applications. Based on our initial experimental results with GPU sharing, GPU sharing with triples mode is easy to use and achieved significant improvement in GPU usage and throughput performance for certain types of AI applications.

GPU Sharing with Triples Mode

TL;DR

The paper addresses rising GPU demand in AI/ML workloads and proposes a GPU-sharing approach based on extending the triples mode within MIT Lincoln Laboratory's LLsub/LLMapReduce toolkit. It defines a node-based mapping across compute resources using the triple and automatic script generation that assigns GPUs via CUDA_VISIBLE_DEVICES, without requiring scheduler changes. Experiments on MNIST/LeNet-4 and ImageNet/ResNet-18 show substantial throughput gains, up to about for MNIST and about for ImageNet under suitable configurations, highlighting memory considerations and scalability limits. The approach is practical to deploy in production HPC environments and can boost GPU utilization for parametric studies and multi-application workloads.

Abstract

There is a tremendous amount of interest in AI/ML technologies due to the proliferation of generative AI applications such as ChatGPT. This trend has significantly increased demand on GPUs, which are the workhorses for training AI models. Due to the high costs of GPUs and lacking supply, it has become of interest to optimize GPU usage in HPC centers. MIT Lincoln Laboratory Supercomputing Center (LLSC) has developed an easy-to-use GPU sharing feature supported by LLSC-developed tools including LLsub and LLMapReduce. This approach overcomes some of the limitations with the existing methods for GPU sharing. This allows users to apply GPU sharing whenever possible while they are developing their AI/ML models and/or doing parametric study on their AI models or executing other GPU applications. Based on our initial experimental results with GPU sharing, GPU sharing with triples mode is easy to use and achieved significant improvement in GPU usage and throughput performance for certain types of AI applications.

Paper Structure

This paper contains 6 sections, 9 figures, 1 table.

Figures (9)

  • Figure 1: An LLload snapshot of resource usage when training the LeNet-4 ML model with the MNIST dataset using PyTorch.
  • Figure 2: Observed GPU load distribution with respect to the number of concurrent training jobs.
  • Figure 3: Observed GPU memory usage distribution with respect to the number of concurrent training jobs.
  • Figure 4: Individual training time variation with respect to the number of concurrent training jobs on a single node with two Volta 100 GPUs.
  • Figure 5: Speedup of the whole training job based on the job elapsed time with respect to the number of concurrent training jobs.
  • ...and 4 more figures