DOLMA: A Data Object Level Memory Disaggregation Framework for HPC Applications
Haoyu Zheng, Shouwei Gao, Jie Ren, Wenqian Dong
TL;DR
HPC centers underutilize memory, motivating memory disaggregation. DOLMA introduces a data object-level framework atop RDMA that offloads large, long-lived objects to remote memory while keeping hot objects local, enabling dual-buffered remote reads and asynchronous writes to hide latency. The approach includes a principled data-object selection strategy, memory-region partitioning, and multi-threading accommodations, implemented with one-sided RDMA. Evaluation on eight HPC workloads shows up to 63% local-memory reduction and less than 16% degradation relative to a fully local baseline, demonstrating practical memory-disaggregation potential for HPC workloads.
Abstract
Memory disaggregation is promising to scale memory capacity and improves utilization in HPC systems. However, the performance overhead of accessing remote memory poses a significant challenge, particularly for compute-intensive HPC applications where execution times are highly sensitive to data locality. In this work, we present DOLMA, a Data Object Level M emory dis Aggregation framework designed for HPC applications. DOLMA intelligently identifies and offloads data objects to remote memory, while providing quantitative analysis to decide a suitable local memory size. Furthermore, DOLMA leverages the predictable memory access patterns typical in HPC applications and enables remote memory prefetch via a dual-buffer design. By carefully balancing local and remote memory usage and maintaining multi-thread concurrency, DOLMA provides a flexible and efficient solution for leveraging disaggregated memory in HPC domains while minimally compromising application performance. Evaluating with eight HPC workloads and computational kernels, DOLMA limits performance degradation to less than 16% while reducing local memory usage by up to 63%, on average.
