DecLock: A Case of Decoupled Locking for Disaggregated Memory
Hanze Zhang, Ke Cheng, Rong Chen, Xingda Wei, Haibo Chen
TL;DR
The paper addresses the performance degradation caused by locking in disaggregated memory systems due to MN-NIC contention. It introduces DecLock, a cooperative queue-notify locking protocol (CQL) that decouples lock state maintenance on MNs from ownership transfer across CNs, using an MN-side centralized queue and decentralized CN coordination with an atomic 64-bit header and a non-atomic data plane. It adds a timestamp-based hierarchical locking design to reduce queue sizes while preserving cross-CN fairness. Experiments show substantial gains, including up to 43.37× throughput over RDMA-based spinlocks and up to 1.81× over MCS locks, along with significant tail-latency reductions for DM applications like an object store and the Sherman index, demonstrating practical impact.
Abstract
This paper reveals that locking can significantly degrade the performance of applications on disaggregated memory (DM), sometimes by several orders of magnitude, due to contention on the NICs of memory nodes (MN-NICs). To address this issue, we present DecLock, a locking mechanism for DM that employs decentralized coordination for ownership transfer across compute nodes (CNs) while retaining centralized state maintenance on memory nodes (MNs). DecLock features cooperative queue-notify locking that queues lock waiters on MNs atomically, enabling clients to transfer lock ownership via message-based notifications between CNs. This approach conserves MN-NIC resources for DM applications and ensures fairness. Evaluations show DecLock achieves throughput improvements of up to 43.37$\times$ and 1.81$\times$ over state-of-the-art RDMA-based spinlocks and MCS locks, respectively. Furthermore, DecLock helps two DM applications, including an object store and a real-world database index (Sherman), avoid performance degradation under high contention, improving throughput by up to 35.60$\times$ and 2.31$\times$ and reducing 99th-percentile latency by up to 98.8% and 82.1%.
