Table of Contents
Fetching ...

The Dawn of Disaggregation and the Coherence Conundrum: A Call for Federated Coherence

Jaewan Hong, Marcos K. Aguilera, Emmanuel Amaro, Vincent Liu, Aurojit Panda, Ion Stoica

TL;DR

This work argues that global cache coherence is impractical for disaggregated memory in large data centers due to scalability and hardware complexity. It formally defines Federated Coherence, a node-local coherence model that eliminates inter-node coherence traffic and leverages existing intra-node coherence, enabling scalable programming with simple paradigms. The authors describe node ownership, immutability, versioning, and a coordination library as the core primitives, and illustrate applications across microservices, publish-subscribe, and immutable object stores. A call to action urges hardware-software communities to converge on a feasible model, develop end-to-end benchmarks, and explore language support and tooling to harness federated coherence in real systems.

Abstract

Disaggregated memory is an upcoming data center technology that will allow nodes (servers) to share data efficiently. Sharing data creates a debate on the level of cache coherence the system should provide. While current proposals aim to provide coherence for all or parts of the disaggregated memory, we argue that this approach is problematic, because of scalability limitations and hardware complexity. Instead, we propose and formally define federated coherence, a model that provides coherence only within nodes, not across nodes. Federated coherence can use current intra-node coherence provided by processors without requiring expensive mechanisms for inter-node coherence. Developers can use federated coherence with a few simple programming paradigms and a synchronization library. We sketch some potential applications.

The Dawn of Disaggregation and the Coherence Conundrum: A Call for Federated Coherence

TL;DR

This work argues that global cache coherence is impractical for disaggregated memory in large data centers due to scalability and hardware complexity. It formally defines Federated Coherence, a node-local coherence model that eliminates inter-node coherence traffic and leverages existing intra-node coherence, enabling scalable programming with simple paradigms. The authors describe node ownership, immutability, versioning, and a coordination library as the core primitives, and illustrate applications across microservices, publish-subscribe, and immutable object stores. A call to action urges hardware-software communities to converge on a feasible model, develop end-to-end benchmarks, and explore language support and tooling to harness federated coherence in real systems.

Abstract

Disaggregated memory is an upcoming data center technology that will allow nodes (servers) to share data efficiently. Sharing data creates a debate on the level of cache coherence the system should provide. While current proposals aim to provide coherence for all or parts of the disaggregated memory, we argue that this approach is problematic, because of scalability limitations and hardware complexity. Instead, we propose and formally define federated coherence, a model that provides coherence only within nodes, not across nodes. Federated coherence can use current intra-node coherence provided by processors without requiring expensive mechanisms for inter-node coherence. Developers can use federated coherence with a few simple programming paradigms and a synchronization library. We sketch some potential applications.

Paper Structure

This paper contains 11 sections, 1 figure.

Figures (1)

  • Figure 1: Overhead of cache coherence vs. core count, where each core contains a pinned thread that repeatedly increments a shared global variable atomically. We measure the aggregate rate of increments per second---coherence overhead is the ratio between the rate in a non-cache coherent and cache-coherent system. Non-cache coherence is emulated by having each thread increment a different variable. The solid line plots measured results, while the others are extrapolated using varying latency to disaggregated memory. Red vertical lines mark NUMA node boundaries.

Theorems & Definitions (2)

  • Definition 1
  • Definition 2