Table of Contents
Fetching ...

TPDE: A Fast Adaptable Compiler Back-End Framework

Tobias Schwarz, Tobias Kamm, Alexis Engelke

TL;DR

TPDE presents a fast, adaptable compiler back-end framework for SSA-form IRs that eliminates the need for an IR translation step. By coupling an IR adapter with architecture-aware instruction compilers and optionally architecture-specific snippet encoders derived from LLVM Machine IR, TPDE achieves single-pass code generation while performing a separate liveness analysis to guide register allocation and spills. The framework is demonstrated via back-ends for LLVM-IR (x86-64/AArch64), Cranelift IR for WebAssembly, and Umbra IR, achieving 8--24x faster compile times than LLVM -O0 with comparable run-time performance and substantial end-to-end improvements in JIT contexts. These results show that adopting TPDE substantially reduces compilation latency across diverse IRs while keeping code quality competitive, enabling faster startup for dynamic languages, databases, and WASM runtimes. The approach also lowers maintenance and porting costs by reusing high-level snippet encoders and providing IR-agnostic components that can be mixed to target new architectures.

Abstract

Fast machine code generation is especially important for fast start-up just-in-time compilation, where the compilation time is part of the end-to-end latency. However, widely used compiler frameworks like LLVM do not prioritize fast compilation and require an extra IR translation step increasing latency even further; and rolling a custom code generator is a substantial engineering effort, especially when targeting multiple architectures. Therefore, in this paper, we present TPDE, a compiler back-end framework that adapts to existing code representations in SSA form. Using an IR-specific adapter providing canonical access to IR data structures and a specification of the IR semantics, the framework performs one analysis pass and then performs the compilation in just a single pass, combining instruction selection, register allocation, and instruction encoding. The generated target instructions are primarily derived code written in high-level language through LLVM's Machine IR, easing portability to different architectures while enabling optimizations during code generation. To show the generality of our framework, we build a new back-end for LLVM from scratch targeting x86-64 and AArch64. Performance results on SPECint 2017 show that we can compile LLVM-IR 8--24x faster than LLVM -O0 while being on-par in terms of run-time performance. We also demonstrate the benefits of adapting to domain-specific IRs in JIT contexts, particularly WebAssembly and database query compilation, where avoiding the extra IR translation further reduces compilation latency.

TPDE: A Fast Adaptable Compiler Back-End Framework

TL;DR

TPDE presents a fast, adaptable compiler back-end framework for SSA-form IRs that eliminates the need for an IR translation step. By coupling an IR adapter with architecture-aware instruction compilers and optionally architecture-specific snippet encoders derived from LLVM Machine IR, TPDE achieves single-pass code generation while performing a separate liveness analysis to guide register allocation and spills. The framework is demonstrated via back-ends for LLVM-IR (x86-64/AArch64), Cranelift IR for WebAssembly, and Umbra IR, achieving 8--24x faster compile times than LLVM -O0 with comparable run-time performance and substantial end-to-end improvements in JIT contexts. These results show that adopting TPDE substantially reduces compilation latency across diverse IRs while keeping code quality competitive, enabling faster startup for dynamic languages, databases, and WASM runtimes. The approach also lowers maintenance and porting costs by reusing high-level snippet encoders and providing IR-agnostic components that can be mixed to target new architectures.

Abstract

Fast machine code generation is especially important for fast start-up just-in-time compilation, where the compilation time is part of the end-to-end latency. However, widely used compiler frameworks like LLVM do not prioritize fast compilation and require an extra IR translation step increasing latency even further; and rolling a custom code generator is a substantial engineering effort, especially when targeting multiple architectures. Therefore, in this paper, we present TPDE, a compiler back-end framework that adapts to existing code representations in SSA form. Using an IR-specific adapter providing canonical access to IR data structures and a specification of the IR semantics, the framework performs one analysis pass and then performs the compilation in just a single pass, combining instruction selection, register allocation, and instruction encoding. The generated target instructions are primarily derived code written in high-level language through LLVM's Machine IR, easing portability to different architectures while enabling optimizations during code generation. To show the generality of our framework, we build a new back-end for LLVM from scratch targeting x86-64 and AArch64. Performance results on SPECint 2017 show that we can compile LLVM-IR 8--24x faster than LLVM -O0 while being on-par in terms of run-time performance. We also demonstrate the benefits of adapting to domain-specific IRs in JIT contexts, particularly WebAssembly and database query compilation, where avoiding the extra IR translation further reduces compilation latency.

Paper Structure

This paper contains 64 sections, 10 figures.

Figures (10)

  • Figure 1: Overview of the TPDE compilation framework. The framework adapts to any IR in SSA form through an IR adapter, which exposes relevant IR properties in a canonical form, and instruction compilers, which provide the actual semantics for IR instructions. Instruction compilers can optionally make calls into instruction snippet encoders, which are generated ahead-of-time from a high-level language.
  • Figure 2: Functionality required from an IR adapter. All instances are referred to by handles; Value has multiple sub-types. Basic blocks need to provide a 64-bit inline storage, values need a per-function unique number that is suitable as array index for access of data structures inside the framework.
  • Figure 3: Overview of the snippet encoder generator targeting AArch64. Semantics are specified in a high-level language like C, which is compiled to the target-specific LLVM Machine IR. From there, we generate a function to generate the code, taking care of register allocation.
  • Figure 4: Although constant operands are folded into snippets, instructions with all-constant inputs are not eliminated. Providing separate snippets for constants can significantly improve the generated code for some operations. (Machine IR lowered to assembly for readability.)
  • Figure 5: Compile- and Run-time speedup normalized to LLVM -O0 on SPECint 2017 with unoptimized LLVM-IR. Compile-time is back-end time, excluding front-end and required LLVM-IR passes.
  • ...and 5 more figures