Table of Contents
Fetching ...

The Cost of Garbage Collection for State Machine Replication

Zhiying Liang, Vahab Jabrayilov, Aleksey Charapko, Abutalib Aghayev

TL;DR

The paper tackles the problem of GC overhead in cloud-based state machine replication by building Replicant, a MultiPaxos-based in-memory KV store, in C++, Java, Rust, and Go and evaluating it under realistic AWS/YCSB workloads. It combines architectural design, implementation choices, and extensive experiments to quantify how GC affects throughput and tail latency, especially under memory and vCPU constraints. The key contributions are (1) a modular Replicant design implemented across four languages, (2) a systematic comparison of GC vs manual memory management in a production-like SMR system, and (3) practical insights into how language choice impacts long-term cloud costs and performance under cloud virtualization. The findings indicate that manual memory management can yield 1–2 orders of magnitude throughput benefits in tight tail-latency regimes and that cloud systems built with GC risk higher cloud costs and degraded tail latency at scale, guiding decisions for cloud system design and language selection.

Abstract

State Machine Replication (SMR) protocols form the backbone of many distributed systems. Enterprises and startups increasingly build their distributed systems on the cloud due to its many advantages, such as scalability and cost-effectiveness. One of the first technical questions companies face when building a system on the cloud is which programming language to use. Among many factors that go into this decision is whether to use a language with garbage collection (GC), such as Java or Go, or a language with manual memory management, such as C++ or Rust. Today, companies predominantly prefer languages with GC, like Go, Kotlin, or even Python, due to ease of development; however, there is no free lunch: GC costs resources (memory and CPU) and performance (long tail latencies due to GC pauses). While there have been anecdotal reports of reduced cloud cost and improved tail latencies when switching from a language with GC to a language with manual memory management, so far, there has not been a systematic study of the GC overhead of running an SMR-based cloud system. This paper studies the overhead of running an SMR-based cloud system written in a language with GC. To this end, we design from scratch a canonical SMR system -- a MultiPaxos-based replicated in-memory key-value store -- and we implement it in C++, Java, Rust, and Go. We compare the performance and resource usage of these implementations when running on the cloud under different workloads and resource constraints and report our results. Our findings have implications for the design of cloud systems.

The Cost of Garbage Collection for State Machine Replication

TL;DR

The paper tackles the problem of GC overhead in cloud-based state machine replication by building Replicant, a MultiPaxos-based in-memory KV store, in C++, Java, Rust, and Go and evaluating it under realistic AWS/YCSB workloads. It combines architectural design, implementation choices, and extensive experiments to quantify how GC affects throughput and tail latency, especially under memory and vCPU constraints. The key contributions are (1) a modular Replicant design implemented across four languages, (2) a systematic comparison of GC vs manual memory management in a production-like SMR system, and (3) practical insights into how language choice impacts long-term cloud costs and performance under cloud virtualization. The findings indicate that manual memory management can yield 1–2 orders of magnitude throughput benefits in tight tail-latency regimes and that cloud systems built with GC risk higher cloud costs and degraded tail latency at scale, guiding decisions for cloud system design and language selection.

Abstract

State Machine Replication (SMR) protocols form the backbone of many distributed systems. Enterprises and startups increasingly build their distributed systems on the cloud due to its many advantages, such as scalability and cost-effectiveness. One of the first technical questions companies face when building a system on the cloud is which programming language to use. Among many factors that go into this decision is whether to use a language with garbage collection (GC), such as Java or Go, or a language with manual memory management, such as C++ or Rust. Today, companies predominantly prefer languages with GC, like Go, Kotlin, or even Python, due to ease of development; however, there is no free lunch: GC costs resources (memory and CPU) and performance (long tail latencies due to GC pauses). While there have been anecdotal reports of reduced cloud cost and improved tail latencies when switching from a language with GC to a language with manual memory management, so far, there has not been a systematic study of the GC overhead of running an SMR-based cloud system. This paper studies the overhead of running an SMR-based cloud system written in a language with GC. To this end, we design from scratch a canonical SMR system -- a MultiPaxos-based replicated in-memory key-value store -- and we implement it in C++, Java, Rust, and Go. We compare the performance and resource usage of these implementations when running on the cloud under different workloads and resource constraints and report our results. Our findings have implications for the design of cloud systems.
Paper Structure (27 sections, 14 figures, 1 table)

This paper contains 27 sections, 14 figures, 1 table.

Figures (14)

  • Figure 1: Replicant Design
  • Figure 2: MultiPaxos Design
  • Figure 3: Throughput vs. average latency of Replicant implementations in different languages, (a) using gRPC for inter-peer communication, and (b) using TCP for inter-peer communication, under YCSB workload A with 2 million entries, which mitigates GC pressure. The data points correspond to 8, 16, 32, 64, 128, 192, and 256 concurrent clients; e.g., the circled data point in (a) corresponds to 64-client results for C++ and Java implementations. Data points with higher than 6 ms average latency are not shown.
  • Figure 4: The latency CDFs generated using (a) default and (b) true latency numbers reported by YCSB for Replicant implementations in C++ and Java running the experiment of \ref{['fig:grpc-and-tcp-latency']} (a) with 64 clients and target throughput of 20 Kops/s, and with server memory limited to 5 GiB.
  • Figure 5: The latency CDFs of Replicant implementations in Go and in Rust (using the stock allocator and jemalloc allocator with Nagle algorithm turned off and with TCP Quick ACK) running the experiment of \ref{['fig:grpc-and-tcp-latency']} (b) with 64 clients and target throughput of 50 Kops/s.
  • ...and 9 more figures