Table of Contents
Fetching ...

GPUVM: GPU-driven Unified Virtual Memory

Nurlan Nazaraliyev, Elaheh Sadredini, Nael Abu-Ghazaleh

TL;DR

This paper proposes GPUVM, a GPU memory management system that uses an RDMA-capable network device to construct a virtual memory system without involving the CPU/OS, and achieves performance up to 4x higher than UVM for latency-bound applications while providing accessible programming abstractions that do not require the users to manage memory transfers directly.

Abstract

Graphics Processing Units (GPUs) leverage massive parallelism and large memory bandwidth to support high-performance computing applications, such as multimedia rendering, crypto-mining, deep learning, and natural language processing. These applications require models and datasets that are getting bigger in size and currently challenge the memory capacity of a single GPU, causing substantial performance overheads. To address this problem, a programmer has to partition the data and manually transfer data in and out of the GPU. This approach requires programmers to carefully tune their applications and can be impractical for workloads with irregular access patterns, such as deep learning, recommender systems, and graph applications. To ease programmability, programming abstractions such as unified virtual memory (UVM) can be used, creating a virtually unified memory space across the whole system and transparently moving the data on demand as it is accessed. However, UVM brings in the overhead of the OS involvement and inefficiencies due to generating many transfer requests especially when the GPU memory is oversubscribed. This paper proposes GPUVM, a GPU memory management system that uses an RDMA-capable network device to construct a virtual memory system without involving the CPU/OS. GPUVM enables on-demand paging for GPU applications and relies on GPU threads for memory management and page migration. Since CPU chipsets do not support GPU-driven memory management, we use a network interface card to facilitate transparent page migration from/to the GPU. GPUVM achieves performance up to 4x higher than UVM for latency-bound applications while providing accessible programming abstractions that do not require the users to manage memory transfers directly.

GPUVM: GPU-driven Unified Virtual Memory

TL;DR

This paper proposes GPUVM, a GPU memory management system that uses an RDMA-capable network device to construct a virtual memory system without involving the CPU/OS, and achieves performance up to 4x higher than UVM for latency-bound applications while providing accessible programming abstractions that do not require the users to manage memory transfers directly.

Abstract

Graphics Processing Units (GPUs) leverage massive parallelism and large memory bandwidth to support high-performance computing applications, such as multimedia rendering, crypto-mining, deep learning, and natural language processing. These applications require models and datasets that are getting bigger in size and currently challenge the memory capacity of a single GPU, causing substantial performance overheads. To address this problem, a programmer has to partition the data and manually transfer data in and out of the GPU. This approach requires programmers to carefully tune their applications and can be impractical for workloads with irregular access patterns, such as deep learning, recommender systems, and graph applications. To ease programmability, programming abstractions such as unified virtual memory (UVM) can be used, creating a virtually unified memory space across the whole system and transparently moving the data on demand as it is accessed. However, UVM brings in the overhead of the OS involvement and inefficiencies due to generating many transfer requests especially when the GPU memory is oversubscribed. This paper proposes GPUVM, a GPU memory management system that uses an RDMA-capable network device to construct a virtual memory system without involving the CPU/OS. GPUVM enables on-demand paging for GPU applications and relies on GPU threads for memory management and page migration. Since CPU chipsets do not support GPU-driven memory management, we use a network interface card to facilitate transparent page migration from/to the GPU. GPUVM achieves performance up to 4x higher than UVM for latency-bound applications while providing accessible programming abstractions that do not require the users to manage memory transfers directly.

Paper Structure

This paper contains 21 sections, 1 equation, 16 figures, 3 tables.

Figures (16)

  • Figure 1: UVM architecture. PU refers to GPU processing units (Streaming Multiprocessors, or SMs on NVIDIA GPUs
  • Figure 2: Breakdown of UVM page transfer latency. Note that host involvement overheads during the page fault are around 7$\times$ higher than the transfer time at 64KB page size.
  • Figure 3: Schematic representation of GPUVM design.
  • Figure 4: GPUVM system workflow for a single thread
  • Figure 5: GPUVM page mapping: Host memory contains all pages. GPU memory is organized as a circular page buffer. Red represents mapped pages, and green unmapped pages.
  • ...and 11 more figures