Table of Contents
Fetching ...

SWARM: Replicating Shared Disaggregated-Memory Data in No Time

Antoine Murat, Clément Burgelin, Athanasios Xygkis, Igor Zablotchi, Marcos K. Aguilera, Rachid Guerraoui

TL;DR

This work proposes SWARM (Swift WAit-free Replication in disaggregated Memory), the first replication scheme for in-disaggregated-memory shared objects to provide single-roundtrip reads and writes, and builds SWARM-KV, a low-latency, strongly consistent and highly available disaggregated key-value store.

Abstract

Memory disaggregation is an emerging data center architecture that improves resource utilization and scalability. Replication is key to ensure the fault tolerance of applications, but replicating shared data in disaggregated memory is hard. We propose SWARM (Swift WAit-free Replication in disaggregated Memory), the first replication scheme for in-disaggregated-memory shared objects to provide (1) single-roundtrip reads and writes in the common case, (2) strong consistency (linearizability), and (3) strong liveness (wait-freedom). SWARM makes two independent contributions. The first is Safe-Guess, a novel wait-free replication protocol with single-roundtrip operations. The second is In-n-Out, a novel technique to provide conditional atomic update and atomic retrieval of large buffers in disaggregated memory in one roundtrip. Using SWARM, we build SWARM-KV, a low-latency, strongly consistent and highly available disaggregated key-value store. We evaluate SWARM-KV and find that it has marginal latency overhead compared to an unreplicated key-value store, and that it offers much lower latency and better availability than FUSEE, a state-of-the-art replicated disaggregated key-value store.

SWARM: Replicating Shared Disaggregated-Memory Data in No Time

TL;DR

This work proposes SWARM (Swift WAit-free Replication in disaggregated Memory), the first replication scheme for in-disaggregated-memory shared objects to provide single-roundtrip reads and writes, and builds SWARM-KV, a low-latency, strongly consistent and highly available disaggregated key-value store.

Abstract

Memory disaggregation is an emerging data center architecture that improves resource utilization and scalability. Replication is key to ensure the fault tolerance of applications, but replicating shared data in disaggregated memory is hard. We propose SWARM (Swift WAit-free Replication in disaggregated Memory), the first replication scheme for in-disaggregated-memory shared objects to provide (1) single-roundtrip reads and writes in the common case, (2) strong consistency (linearizability), and (3) strong liveness (wait-freedom). SWARM makes two independent contributions. The first is Safe-Guess, a novel wait-free replication protocol with single-roundtrip operations. The second is In-n-Out, a novel technique to provide conditional atomic update and atomic retrieval of large buffers in disaggregated memory in one roundtrip. Using SWARM, we build SWARM-KV, a low-latency, strongly consistent and highly available disaggregated key-value store. We evaluate SWARM-KV and find that it has marginal latency overhead compared to an unreplicated key-value store, and that it offers much lower latency and better availability than FUSEE, a state-of-the-art replicated disaggregated key-value store.
Paper Structure (64 sections, 13 theorems, 13 figures, 3 tables)

This paper contains 64 sections, 13 theorems, 13 figures, 3 tables.

Key Result

Theorem A.1

alg:wmr satisfies validity.

Figures (13)

  • Figure 1: Fast reads complete in one roundtrip to the replicas in disaggregated memory by finding a value tagged as verified. Fast writes complete in one roundtrip by writing their value with a guessed timestamp and confirming the freshness of the latter via a parallel read. Successful writes are tagged as verified in the background.
  • Figure 2: A max register read reports the wrong maximum out of writes concurrent to it. Mismatching majorities lead to a read of 1, despite 1 being written after 2. However, no subsequent read can return 1, as write 2 will be over.
  • Figure 3: Outline of an In-n-Out write. In one roundtrip, the thread (1) writes to an out-of-place buffer, (2) updates the metadata to point to it, and (3) updates the in-place data.
  • Figure 4: Architecture of SWARM-KV with a single key. Clients use SWARM reads and writes to directly access the value replicated in disaggregated memory. The replicas that form Safe-Guess' max register are spread across different memory nodes and implemented via In-n-Out max registers. The location of the replicas is stored in a reliable index.
  • Figure 5: Latency CDFs of RAW, SWARM-KV, DM-ABD and FUSEE with YCSB workload B (95% gets and 5% updates), Zipfian key distribution, and 4 clients.
  • ...and 8 more figures

Theorems & Definitions (14)

  • Theorem A.1
  • Lemma A.2
  • Lemma A.3
  • Theorem A.4
  • Theorem A.5
  • Theorem A.6
  • Theorem B.1
  • Theorem B.2
  • Theorem B.3
  • Definition 1
  • ...and 4 more