Catalpa: GC for a Low-Variance Software Stack
Anthony Arnold, Mark Marron
TL;DR
This paper tackles tail-latency and performance variability in garbage-collected runtimes by introducing Catalpa, a novel GC designed for the Bosque language. Catalpa combines a copying nursery with a reference-counted old-space, leverages Bosque’s immutability and cycle freedom to avoid barriers and remembered sets, and provides fixed work per allocation with constant memory overhead. The authors formalize a no-tradeoff memory-subsystem happiness property, prove relevant bounds, and demonstrate through extensive experiments that Catalpa delivers highly predictable tail latencies (with 50th percentile pauses around the low hundreds of milliseconds and 99th percentile under a few hundred) and low memory overhead, while remaining robust under parameter variation and competing well with state-of-the-art Java GCs. The work suggests a feasible path to reliable, low-latency runtimes for modern software stacks by aligning language design, memory management, and runtime behavior toward memoryless execution and bounded overheads.
Abstract
The performance of an application/runtime is usually conceptualized as a continuous function where, the lower the amount of memory/time used on a given workload, then the better the compiler/runtime is. However, in practice, good performance of an application is viewed as more of a binary function - either the application responds in under, say 100 ms, and provides a good user experience, or it takes a noticeable amount of time, leaving the user waiting and potentially abandoning the task. Thus, performance really means how often the application is fast enough to meet user expectations, leading industrial developers to focus on the 95th and 99th percentile tail-latencies as heavily, or moreso, than average response time. Our vision is to create a software stack that actively supports these needs via programming language and runtime system design. In this paper we present a novel garbage-collector design, the Catalpa collector, for the Bosque programming language and runtime. This allocator is designed to minimize latency and tail-latency variability while maintaining high-throughput and incurring small memory overheads. To achieve these goals we leverage various features of the Bosque language, including immutability and reference-cycle freedom, to construct a collector that has provably bounded collection pauses, incurs a fixed-constant memory overhead, and ensures starvation freedom for the application!
