Flexible Swapping for the Cloud
Milan Pandurov, Lukas Humbel, Dmitry Sepp, Adamos Ttofari, Leon Thomm, Do Le Quoc, Siddharth Chandrasekaran, Sharan Santhanam, Chuan Ye, Shai Bergman, Wei Wang, Sven Lundgren, Konstantinos Sagonas, Alberto Ros
TL;DR
The paper addresses memory overcommit challenges in cloud data centers, where VM memory is often underutilized. It introduces a flexible, userspace memory management framework with per-VM Memory Managers, a policy API, and a SPDK-backed storage backend to reclaim cold memory using strict hugepage swapping. Key contributions include the architecture (MM, policies, storage backend), VM introspection for policy guidance, kernel-assisted EPT scanning, and zero-copy I/O support, with evaluation showing up to 25% performance gains and substantial memory savings, plus additional gains from workload-specific policies. This approach promises significant practical impact by improving memory utilization and reducing cloud operating costs through VM-aware overcommit and flexible reclaim strategies.
Abstract
Memory has become the primary cost driver in cloud data centers. Yet, a significant portion of memory allocated to VMs in public clouds remains unused. To optimize this resource, "cold" memory can be reclaimed from VMs and stored on slower storage or compressed, enabling memory overcommit. Current overcommit systems rely on general-purpose OS swap mechanisms, which are not optimized for virtualized workloads, leading to missed memory-saving opportunities and ineffective use of optimizations like prefetchers. This paper introduces a userspace memory management framework designed for VMs. It enables custom policies that have full control over the virtual machines' memory using a simple userspace API, supports huge page-based swapping to satisfy VM performance requirements, is easy to deploy by leveraging Linux/KVM, and supports zero-copy I/O virtualization with shared VM memory. Our evaluation demonstrates that an overcommit system based on our framework outperforms the state-of-the-art solutions on both micro-benchmarks and commonly used cloud workloads. Specifically our implementation outperforms the Linux Kernel baseline implementation by up to 25% while saving a similar amount of memory. We also demonstrate the benefits of custom policies by implementing workload-specific reclaimers and prefetchers that save $10\%$ additional memory, improve performance in a limited memory scenario by 30% over the Linux baseline, and recover faster from hard limit releases.
