Table of Contents
Fetching ...

Partial Evaluation, Whole-Program Compilation

Chris Fallin, Maxwell Bernstein

TL;DR

This work introduces a partial evaluator that can compile a whole guest-language function ahead-of-time, without tracing or profiling, “for free,” and outlines an approach to carry this work further, deriving more of the capabilities of a JIT backend from first principles while retaining correctness.

Abstract

There is a tension in dynamic language runtime design between speed and correctness: state-of-the-art JIT compilation, the result of enormous industrial investment and significant research, achieves heroic speedups at the cost of complexity that can result in serious correctness bugs. Much of this complexity comes from the existence of multiple tiers and the need to maintain correspondence between these separate definitions of the language's semantics; also, from the indirect nature of the semantics implicitly encoded in a compiler backend. One way to address this complexity is to automatically derive, as much as possible, the compiled code from a single source-of-truth; for example, the interpreter tier. In this work, we introduce a partial evaluator that can derive compiled code ``for free'' by specializing an interpreter with its bytecode. This transform operates on the interpreter body at a basic-block IR level and is applicable to almost unmodified existing interpreters in systems languages such as C or C++. We show the effectiveness of this new tool by applying it to the interpreter tier of an existing industrial JavaScript engine, SpiderMonkey, yielding $2.17\times$ speedups, and the PUC-Rio Lua interpreter, yielding $1.84\times$ speedups with only three hours' effort. Finally, we outline an approach to carry this work further, deriving more of the capabilities of a JIT backend from first principles while retaining semantics-preserving correctness.

Partial Evaluation, Whole-Program Compilation

TL;DR

This work introduces a partial evaluator that can compile a whole guest-language function ahead-of-time, without tracing or profiling, “for free,” and outlines an approach to carry this work further, deriving more of the capabilities of a JIT backend from first principles while retaining correctness.

Abstract

There is a tension in dynamic language runtime design between speed and correctness: state-of-the-art JIT compilation, the result of enormous industrial investment and significant research, achieves heroic speedups at the cost of complexity that can result in serious correctness bugs. Much of this complexity comes from the existence of multiple tiers and the need to maintain correspondence between these separate definitions of the language's semantics; also, from the indirect nature of the semantics implicitly encoded in a compiler backend. One way to address this complexity is to automatically derive, as much as possible, the compiled code from a single source-of-truth; for example, the interpreter tier. In this work, we introduce a partial evaluator that can derive compiled code ``for free'' by specializing an interpreter with its bytecode. This transform operates on the interpreter body at a basic-block IR level and is applicable to almost unmodified existing interpreters in systems languages such as C or C++. We show the effectiveness of this new tool by applying it to the interpreter tier of an existing industrial JavaScript engine, SpiderMonkey, yielding speedups, and the PUC-Rio Lua interpreter, yielding speedups with only three hours' effort. Finally, we outline an approach to carry this work further, deriving more of the capabilities of a JIT backend from first principles while retaining semantics-preserving correctness.

Paper Structure

This paper contains 28 sections, 12 figures.

Figures (12)

  • Figure 1: An sketch of an interpreter loop written in C.
  • Figure 2: Compiled code resulting from constant propagation of interpret from Fig. \ref{['fig:motivation-interp-loop']} on one opcode.
  • Figure 3: An illustration of constant propagation over an interpreter loop: with one iteration, we can deduce constant values, but multiple iterations cause the analysis to degrade to "unknown" because all iterations are considered together.
  • Figure 4: Annotations to context-specialize analysis of an interpreter function.
  • Figure 5: Pseudocode for the main specialization (Futamura projection) algorithm.
  • ...and 7 more figures