Table of Contents
Fetching ...

DiFache: Efficient and Scalable Caching on Disaggregated Memory using Decentralized Coherence

Hanze Zhang, Kaiming Wang, Rong Chen, Xingda Wei, Haibo Chen

TL;DR

The paper tackles the scalability bottleneck of cache coherence in disaggregated memory systems by introducing DiFache, a decentralized CN-side caching framework. It replaces a centralized coherence manager with per-object, cross-CN invalidation and adaptive caching driven by real-time profits, using a Hopscotch-based cache index and atomic owner tracking. Real-world traces and applications demonstrate substantial throughput and latency improvements, with up to $10.83\times$ speedups and significant end-to-end application performance gains. The work offers a practical path toward scalable, coherent CN-side caching in DM and discusses future hardware-coherence opportunities on CXL-based platforms.

Abstract

The disaggregated memory (DM) architecture offers high resource elasticity at the cost of data access performance. While caching frequently accessed data in compute nodes (CNs) reduces access overhead, it requires costly centralized maintenance of cache coherence across CNs. This paper presents DiFache, an efficient, scalable, and coherent CN-side caching framework for DM applications. Observing that DM applications already serialize conflicting remote data access internally rather than relying on the cache layer, DiFache introduces decentralized coherence that aligns its consistency model with memory nodes instead of CPU caches, thereby eliminating the need for centralized management. DiFache features a decentralized invalidation mechanism to independently invalidate caches on remote CNs and a fine-grained adaptive scheme to cache objects with varying read-write ratios. Evaluations using 54 real-world traces from Twitter show that DiFache outperforms existing approaches by up to 10.83$\times$ (5.53$\times$ on average). By integrating DiFache, the peak throughput of two real-world DM applications increases by 7.94$\times$ and 2.19$\times$, respectively.

DiFache: Efficient and Scalable Caching on Disaggregated Memory using Decentralized Coherence

TL;DR

The paper tackles the scalability bottleneck of cache coherence in disaggregated memory systems by introducing DiFache, a decentralized CN-side caching framework. It replaces a centralized coherence manager with per-object, cross-CN invalidation and adaptive caching driven by real-time profits, using a Hopscotch-based cache index and atomic owner tracking. Real-world traces and applications demonstrate substantial throughput and latency improvements, with up to speedups and significant end-to-end application performance gains. The work offers a practical path toward scalable, coherent CN-side caching in DM and discusses future hardware-coherence opportunities on CXL-based platforms.

Abstract

The disaggregated memory (DM) architecture offers high resource elasticity at the cost of data access performance. While caching frequently accessed data in compute nodes (CNs) reduces access overhead, it requires costly centralized maintenance of cache coherence across CNs. This paper presents DiFache, an efficient, scalable, and coherent CN-side caching framework for DM applications. Observing that DM applications already serialize conflicting remote data access internally rather than relying on the cache layer, DiFache introduces decentralized coherence that aligns its consistency model with memory nodes instead of CPU caches, thereby eliminating the need for centralized management. DiFache features a decentralized invalidation mechanism to independently invalidate caches on remote CNs and a fine-grained adaptive scheme to cache objects with varying read-write ratios. Evaluations using 54 real-world traces from Twitter show that DiFache outperforms existing approaches by up to 10.83 (5.53 on average). By integrating DiFache, the peak throughput of two real-world DM applications increases by 7.94 and 2.19, respectively.

Paper Structure

This paper contains 25 sections, 1 equation, 15 figures.

Figures (15)

  • Figure 1: Peak throughput scaling of different caching schemes on DM (left) and median latency breakdown of cache operations (right). Workload: a real-world Twitter trace (No. 4) with 93% reads. Testbed: 9 CNs and 1 MN, connected with 100 Gbps RDMA NICs.
  • Figure 2: The architecture and workflow of centralized (top) and decentralized (bottom) designs of cache coherence.
  • Figure 3: Simplified code snippets of a DM application demonstrating update and lookup operations for leaf nodes in a DM-based tree index (Sherman sherman). The highlighted codes with a red background serialize remote accesses.
  • Figure 4: Median latency breakdown of read and update operations in Sherman running YCSB workloads (left), and that of writes and read misses in CMCache running the No. 4 Twitter trace (right). YCSB C has no update latency because it is read-only.
  • Figure 5: The architecture (top) and APIs (bottom) of DiFache.
  • ...and 10 more figures