Table of Contents
Fetching ...

Taking the Leap: Efficient and Reliable Fine-Grained NUMA Migration in User-space

Felix Schuhknecht, Nick Rassau

TL;DR

On multi-socket NUMA systems, data locality critically impacts performance. The authors introduce page_leap, a user-space memory rewiring approach that can migrate pages asynchronously with adaptive granularity while guaranteeing completion and supporting pooled memory; a segmentation fault-based mechanism handles concurrent writes. The authors validate on a two-socket server with $128$ GB per NUMA region (total $256$ GB), showing page_leap outperforms auto NUMA balancing and Linux's move_pages() for both small and huge pages and under realistic DB workloads like morsels and TPC-H queries. The work provides a practical, kernel-friendly NUMA migration primitive that can be integrated into DBMS runtimes to improve locality and reduce remote memory accesses.

Abstract

Modern multi-socket architectures offer a single virtual address space, but physically divide main-memory across multiple regions, where each region is attached to a CPU and its cores. While this simplifies the usage, developers must be aware of non-uniform memory access (NUMA), where an access by a thread running on a core-local NUMA region is significantly cheaper than an access from a core-remote region. Obviously, if query answering is parallelized across the cores of multiple regions, then the portion of the database on which the query is operating should be distributed across the same regions to ensure local accesses. As the present data placement might not fit this, migrating pages from one NUMA region to another can be performed to improve the situation. To do so, different options exist: One option is to rely on automatic NUMA balancing integrated in Linux, which is steered by the observed access patterns and frequency. Another option is to actively trigger migration via the system call move_pages(). Unfortunately, both variants have significant downsides in terms of their feature set and performance. As an alternative, we propose a new user-space migration method called page_leap() that can perform page migration asynchronously at a high performance by exploiting features of the virtual memory subsystem. The method is (a) actively triggered by the user, (b) ensures that all pages are eventually migrated, (c) handles concurrent writes correctly, (d) supports pooled memory, (e) adaptively adjusts its migration granularity based on the workload, and (f) supports both small pages and huge pages.

Taking the Leap: Efficient and Reliable Fine-Grained NUMA Migration in User-space

TL;DR

On multi-socket NUMA systems, data locality critically impacts performance. The authors introduce page_leap, a user-space memory rewiring approach that can migrate pages asynchronously with adaptive granularity while guaranteeing completion and supporting pooled memory; a segmentation fault-based mechanism handles concurrent writes. The authors validate on a two-socket server with GB per NUMA region (total GB), showing page_leap outperforms auto NUMA balancing and Linux's move_pages() for both small and huge pages and under realistic DB workloads like morsels and TPC-H queries. The work provides a practical, kernel-friendly NUMA migration primitive that can be integrated into DBMS runtimes to improve locality and reduce remote memory accesses.

Abstract

Modern multi-socket architectures offer a single virtual address space, but physically divide main-memory across multiple regions, where each region is attached to a CPU and its cores. While this simplifies the usage, developers must be aware of non-uniform memory access (NUMA), where an access by a thread running on a core-local NUMA region is significantly cheaper than an access from a core-remote region. Obviously, if query answering is parallelized across the cores of multiple regions, then the portion of the database on which the query is operating should be distributed across the same regions to ensure local accesses. As the present data placement might not fit this, migrating pages from one NUMA region to another can be performed to improve the situation. To do so, different options exist: One option is to rely on automatic NUMA balancing integrated in Linux, which is steered by the observed access patterns and frequency. Another option is to actively trigger migration via the system call move_pages(). Unfortunately, both variants have significant downsides in terms of their feature set and performance. As an alternative, we propose a new user-space migration method called page_leap() that can perform page migration asynchronously at a high performance by exploiting features of the virtual memory subsystem. The method is (a) actively triggered by the user, (b) ensures that all pages are eventually migrated, (c) handles concurrent writes correctly, (d) supports pooled memory, (e) adaptively adjusts its migration granularity based on the workload, and (f) supports both small pages and huge pages.
Paper Structure (13 sections, 8 figures, 2 tables)

This paper contains 13 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Local accesses vs remote accesses under different access patterns for small pages and huge pages.
  • Figure 2: Migration time of move_pages() vs memcpy() from one NUMA region to the other for small and huge pages.
  • Figure 3: Correctly handling concurrent writes for four pages, where third page is currently under migration.
  • Figure 4: move_pages() vs page_leap() without concurrent accesses for different granularities and page sizes. memcpy() resembles the theoretical optimum.
  • Figure 5: Migration under concurrent writes for small pages and a reduction factor 2 for page_leap().
  • ...and 3 more figures