Table of Contents
Fetching ...

Garbage Collection for Rust: The Finalizer Frontier

Jacob Hughes, Laurence Tratt

TL;DR

Alloy presents a Rust-oriented GC that reuses destructors as finalizers via a $Gc<T>$ API, integrating a conservative backend ($BDWGC$) and addressing classic finalizer challenges with Finalizer Safety Analysis, Premature Finalizer Prevention, and Finalizer Elision, plus a separate finalizer thread. The approach leverages Rust’s ownership guarantees to support most destructors as finalizers while ensuring safety in the presence of cycles and cross-thread execution. Empirical evaluation shows a modest wall-clock overhead (~5%) with variable memory footprints and clear benefits from finalizer elision, though performance is context-dependent relative to Arena-based or RC-based approaches. The work outlines practical pathways for integrating GC into Rust codebases and highlights both the potential gains and the need for further refinement to reach production readiness.

Abstract

Rust is a non-Garbage Collected (GCed) language, but the lack of GC makes expressing data-structures that require shared ownership awkward, inefficient, or both. In this paper we explore a new design for, and implementation of, GC in Rust, called Alloy. Unlike previous approaches to GC in Rust, Alloy allows existing Rust destructors to be automatically used as GC finalizers: this makes Alloy integrate better with existing Rust code than previous solutions but introduces surprising soundness and performance problems. Alloy provides novel solutions for the core problems: finalizer safety analysis rejects unsound destructors from automatically being reused as finalizers; finalizer elision optimises away unnecessary finalizers; and premature finalizer prevention ensures that finalizers are only run when it is provably safe to do so.

Garbage Collection for Rust: The Finalizer Frontier

TL;DR

Alloy presents a Rust-oriented GC that reuses destructors as finalizers via a API, integrating a conservative backend () and addressing classic finalizer challenges with Finalizer Safety Analysis, Premature Finalizer Prevention, and Finalizer Elision, plus a separate finalizer thread. The approach leverages Rust’s ownership guarantees to support most destructors as finalizers while ensuring safety in the presence of cycles and cross-thread execution. Empirical evaluation shows a modest wall-clock overhead (~5%) with variable memory footprints and clear benefits from finalizer elision, though performance is context-dependent relative to Arena-based or RC-based approaches. The work outlines practical pathways for integrating GC into Rust codebases and highlights both the potential gains and the need for further refinement to reach production readiness.

Abstract

Rust is a non-Garbage Collected (GCed) language, but the lack of GC makes expressing data-structures that require shared ownership awkward, inefficient, or both. In this paper we explore a new design for, and implementation of, GC in Rust, called Alloy. Unlike previous approaches to GC in Rust, Alloy allows existing Rust destructors to be automatically used as GC finalizers: this makes Alloy integrate better with existing Rust code than previous solutions but introduces surprising soundness and performance problems. Alloy provides novel solutions for the core problems: finalizer safety analysis rejects unsound destructors from automatically being reused as finalizers; finalizer elision optimises away unnecessary finalizers; and premature finalizer prevention ensures that finalizers are only run when it is provably safe to do so.

Paper Structure

This paper contains 30 sections, 5 figures, 13 tables, 3 algorithms.

Figures (5)

  • Figure 1: Comparing the effects of and on wall-clock time; heap footprint (i.e. the size of the live set); and RSS. The baseline at 1 is ; values less than 1 show as better than ; and the blue vertical line shows the geometric mean of ratios. The wall-clock times of and are similar; the RSS somewhat similar; and the average heap footprint often very different. Broadly speaking, increases the average heap footprint because GC, and especially finalization, causes values to live for longer. Benchmarks which allocate relatively little memory (particularly Ripgrep as shown in \ref{['tab:app:mem:conversion:runtime']}) can exaggerate this effect. Perhaps surprisingly, the heap footprint and RSS do not correlate. This is partly because the sample rate for RSS is rather low (which notably affects fast running benchmarks such as those for fd) and partly because RSS necessarily includes headroom, that is memory beyond that needed for the live set (and which may later be returned to the OS).
  • Figure 2: A selection of time-series data with various GC approaches, showing normalised time on the $x$-axis and heap footprint (with 99% confidence intervals shaded) on the $y$-axis. (i.e. the amount of live memory) over time. Binary Trees shows an example of Alloy having a comparable heap footprint to ; Rust-GC's heap footprint is around 4$\times$ greater. Binary Trees is a perfect fit to : it frees memory in one batch at the end, and because it is 3$\times$ faster than Alloy, this 'wind down' period is a substantial portion of the overall (quick!) execution. Ripgrep Alternates may seem to be an example of a memory leak in Alloy, but it is really the result of the inevitable delay that GC imposes on noticing that values are lived, which is exacerbated by the presence of finalizers. The frequent plateaus and dips show that memory is being freed, but at a later point than one might initially expect. In contrast, som-rs-bc JSON Small shows a real memory leak due to cyclic objects in , where Alloy's heap footprint remains steady.
  • Figure 3: The effects of finalizer elision on various metrics. The top-left chart shows the proportion of run-time values that: have had their finalizers elided; cannot have their finalizers elided; have no finalizers to elide. This chart is best read in conjunction in \ref{['tab:app:mem:conversion:runtime']} to (a) get a sense of the quantity of run-time memory involved (b) how much indirectly owned memory the values have. The other plots use 'no finalizer elision' as the normalization base (i.e. values below 1 show that finalizer elision improves a metric). Total GC pause time is the cumulative time spent in stop-the-world collections. User time captures the time spent in all threads, including the finalizer thread. Broadly speaking, the more finalizers are elided, and the greater the proportion of the overall heap the memory owned by , the better the metrics become.
  • Figure 4: The effect of premature finalization optimisation, normalised to None (i.e. no fences). Grey bars represent the ratio for naive (all possible fences) and blue bars optimised (obviously unnecessary fences removed). Unfortunately, this attempted optimisation has no statistically significant effects.
  • Figure 5: Wall-clock and user time performance comparison for finalizer elision on each benchmark. The bars show the relative performance of Alloy after applying our elision optimization, normalized against the baseline (solid black line). The vertical blue line marks the overall geometric mean (with shaded area for CIs). User time often shows greater improvement than wall-clock time, as elision reduces the CPU overhead of the finalization thread.