Revisiting Page Migration for Main-Memory Database Systems
Yeasir Rayhan, Walid G. Aref
TL;DR
The paper addresses the challenge of efficiently migrating memory pages in main-memory databases (MMDBs) on modern heterogeneous hardware, where non-uniform memory access and tiered/disaggregated memory increase migration costs. It introduces move_pages2, a custom Linux system call that gives MMDBs control over page migration through migrate_mode and nr_max_batched_migration and supports partial migrations to better match workload dynamics. Empirical evaluation on NUMA and chiplet platforms using a YCSB-like workload with a B+-tree shows up to 2.3x improvement in query throughput and up to 2.6x improvement in page migration throughput compared to the native move_pages; the knobs allow additional gains up to around 18% in some cases. These results demonstrate the value of DB-OS co-design for memory management in modern hardware and offer practical guidance for tuning page migration in MMDBs.
Abstract
Modern hardware architectures, e.g., NUMA servers, chiplet processors, tiered and disaggregated memory systems have significantly improved the performance of Main-Memory Databases, and are poised to deliver further improvements in the future. However, realizing this potential depends on the database system's ability to efficiently migrate pages among different NUMA nodes, and/or memory chips as the workload evolves. Modern main memory databases offload the migration procedure to the operating system without accounting for the workload and its migration characteristics. In this paper, we propose a custom system call move_pages2 as an alternate to Linux's own move_pages system call. In contrast to the original move_pages, move_pages2 allows partial migration and exposes two configuration knobs, enabling a Main-Memory Database tailor the migration process to its specific requirements. Experiments on a main-memory B$^+$-Tree for a YCSB-like workload show that the proposed move_pages2 custom system call improves the B$^+$-Tree query throughput by up to 2.3$\times$, and migrates up to 2.6$\times$ more memory pages, outperforming the native Linux system call.
