Table of Contents
Fetching ...

Catalpa: GC for a Low-Variance Software Stack

Anthony Arnold, Mark Marron

TL;DR

This paper tackles tail-latency and performance variability in garbage-collected runtimes by introducing Catalpa, a novel GC designed for the Bosque language. Catalpa combines a copying nursery with a reference-counted old-space, leverages Bosque’s immutability and cycle freedom to avoid barriers and remembered sets, and provides fixed work per allocation with constant memory overhead. The authors formalize a no-tradeoff memory-subsystem happiness property, prove relevant bounds, and demonstrate through extensive experiments that Catalpa delivers highly predictable tail latencies (with 50th percentile pauses around the low hundreds of milliseconds and 99th percentile under a few hundred) and low memory overhead, while remaining robust under parameter variation and competing well with state-of-the-art Java GCs. The work suggests a feasible path to reliable, low-latency runtimes for modern software stacks by aligning language design, memory management, and runtime behavior toward memoryless execution and bounded overheads.

Abstract

The performance of an application/runtime is usually conceptualized as a continuous function where, the lower the amount of memory/time used on a given workload, then the better the compiler/runtime is. However, in practice, good performance of an application is viewed as more of a binary function - either the application responds in under, say 100 ms, and provides a good user experience, or it takes a noticeable amount of time, leaving the user waiting and potentially abandoning the task. Thus, performance really means how often the application is fast enough to meet user expectations, leading industrial developers to focus on the 95th and 99th percentile tail-latencies as heavily, or moreso, than average response time. Our vision is to create a software stack that actively supports these needs via programming language and runtime system design. In this paper we present a novel garbage-collector design, the Catalpa collector, for the Bosque programming language and runtime. This allocator is designed to minimize latency and tail-latency variability while maintaining high-throughput and incurring small memory overheads. To achieve these goals we leverage various features of the Bosque language, including immutability and reference-cycle freedom, to construct a collector that has provably bounded collection pauses, incurs a fixed-constant memory overhead, and ensures starvation freedom for the application!

Catalpa: GC for a Low-Variance Software Stack

TL;DR

This paper tackles tail-latency and performance variability in garbage-collected runtimes by introducing Catalpa, a novel GC designed for the Bosque language. Catalpa combines a copying nursery with a reference-counted old-space, leverages Bosque’s immutability and cycle freedom to avoid barriers and remembered sets, and provides fixed work per allocation with constant memory overhead. The authors formalize a no-tradeoff memory-subsystem happiness property, prove relevant bounds, and demonstrate through extensive experiments that Catalpa delivers highly predictable tail latencies (with 50th percentile pauses around the low hundreds of milliseconds and 99th percentile under a few hundred) and low memory overhead, while remaining robust under parameter variation and competing well with state-of-the-art Java GCs. The work suggests a feasible path to reliable, low-latency runtimes for modern software stacks by aligning language design, memory management, and runtime behavior toward memoryless execution and bounded overheads.

Abstract

The performance of an application/runtime is usually conceptualized as a continuous function where, the lower the amount of memory/time used on a given workload, then the better the compiler/runtime is. However, in practice, good performance of an application is viewed as more of a binary function - either the application responds in under, say 100 ms, and provides a good user experience, or it takes a noticeable amount of time, leaving the user waiting and potentially abandoning the task. Thus, performance really means how often the application is fast enough to meet user expectations, leading industrial developers to focus on the 95th and 99th percentile tail-latencies as heavily, or moreso, than average response time. Our vision is to create a software stack that actively supports these needs via programming language and runtime system design. In this paper we present a novel garbage-collector design, the Catalpa collector, for the Bosque programming language and runtime. This allocator is designed to minimize latency and tail-latency variability while maintaining high-throughput and incurring small memory overheads. To achieve these goals we leverage various features of the Bosque language, including immutability and reference-cycle freedom, to construct a collector that has provably bounded collection pauses, incurs a fixed-constant memory overhead, and ensures starvation freedom for the application!

Paper Structure

This paper contains 28 sections, 5 theorems, 4 figures, 7 tables.

Key Result

Theorem 1

The work done by the garbage collector (GC) for each allocation is fixed and bounded by the function $O(\text{Field}_{\text{ct}} * (\text{Cost}(\text{Mark} + \text{Fwd} + \text{Inc} + \text{Dec}) + \text{Cost}(\text{Alloc} + \text{Copy} + \text{Release})))$, and does not depend on the lifetime or be

Figures (4)

  • Figure 1: Max Temperature Range in the Bosque Programming Language.
  • Figure 2: Top-level allocation (size-segmented) and collection algorithms. The Allocator class is templated on object sizes K and maintains a free-list of available locations, per memory page, to allocate from in the freelist field. The collect function shows the high-level steps of the collection algorithm which is run when the nursery is full (default of 8).
  • Figure 3: Evolution of the (logical) memory layout during a collection cycle -- from initial state, to marking young objects, evacuating young objects, adjusting reference counts, and finally resetting the nursery.
  • Figure 4: Catalpa Memory organization of pages, size-segregated allocators, and per-page free-lists. Each allocator manages a set of pages for allocation/evacuation, and partially filled pages for the RC-old space.

Theorems & Definitions (5)

  • Theorem 1: Fixed Work Per Allocation
  • Theorem 2: Bounded Collector Pauses
  • Theorem 3: Effective Collections
  • Theorem 4: Fixed Memory Overhead w.r.t. Application Memory Usage
  • Theorem 5: Memory Subsystem Happiness