Table of Contents
Fetching ...

Boosting Cross-Architectural Emulation Performance by Foregoing the Intermediate Representation Model

Amy Iris Parker

TL;DR

Cross-architectural emulation performance is hampered by QEMU's IR-based TCG translation. The authors propose a three-tier emulation stack that inserts a direct binary translation middle layer for frequent host-guest pairings, aiming to balance development effort with runtime performance. They implement a RISCV-64 PoC emulator (riscv-um) and benchmark generator (benchgen), reporting up to $35\\times$ faster execution in synthetic tests compared to QEMU's TCG, underscoring substantial potential gains. The work lays a practical path toward improved cross-architecture emulation and identifies integration challenges and future work needed to realize automatic engine selection within QEMU.

Abstract

As more applications utilize virtualization and emulation to run mission-critical tasks, the performance requirements of emulated and virtualized platforms continue to rise. Hardware virtualization is not universally available for all systems, and is incapable of emulating CPU architectures, requiring software emulation to be used. QEMU, the premier cross-architecture emulator for Linux and some BSD systems, currently uses dynamic binary translation (DBT) through intermediate representations using its Tiny Code Generator (TCG) model. While using intermediate representations of translated code allows QEMU to quickly add new host and guest architectures, it creates additional steps in the emulation pipeline which decrease performance. We construct a proof of concept emulator to demonstrate the slowdown caused by the usage of intermediate representations in TCG; this emulator performed up to 35x faster than QEMU with TCG, indicating substantial room for improvement in QEMU's design. We propose an expansion of QEMU's two-tier engine system (Linux KVM versus TCG) to include a middle tier using direct binary translation for commonly paired architectures such as RISC-V, x86, and ARM. This approach provides a slidable trade-off between development effort and performance depending on the needs of end users.

Boosting Cross-Architectural Emulation Performance by Foregoing the Intermediate Representation Model

TL;DR

Cross-architectural emulation performance is hampered by QEMU's IR-based TCG translation. The authors propose a three-tier emulation stack that inserts a direct binary translation middle layer for frequent host-guest pairings, aiming to balance development effort with runtime performance. They implement a RISCV-64 PoC emulator (riscv-um) and benchmark generator (benchgen), reporting up to faster execution in synthetic tests compared to QEMU's TCG, underscoring substantial potential gains. The work lays a practical path toward improved cross-architecture emulation and identifies integration challenges and future work needed to realize automatic engine selection within QEMU.

Abstract

As more applications utilize virtualization and emulation to run mission-critical tasks, the performance requirements of emulated and virtualized platforms continue to rise. Hardware virtualization is not universally available for all systems, and is incapable of emulating CPU architectures, requiring software emulation to be used. QEMU, the premier cross-architecture emulator for Linux and some BSD systems, currently uses dynamic binary translation (DBT) through intermediate representations using its Tiny Code Generator (TCG) model. While using intermediate representations of translated code allows QEMU to quickly add new host and guest architectures, it creates additional steps in the emulation pipeline which decrease performance. We construct a proof of concept emulator to demonstrate the slowdown caused by the usage of intermediate representations in TCG; this emulator performed up to 35x faster than QEMU with TCG, indicating substantial room for improvement in QEMU's design. We propose an expansion of QEMU's two-tier engine system (Linux KVM versus TCG) to include a middle tier using direct binary translation for commonly paired architectures such as RISC-V, x86, and ARM. This approach provides a slidable trade-off between development effort and performance depending on the needs of end users.
Paper Structure (10 sections, 6 figures)

This paper contains 10 sections, 6 figures.

Figures (6)

  • Figure 1: QEMU memory architecture based on the official API example qemumemory
  • Figure 2: Overall TCG execution model, adapted from Gligor et al. 10.1145/1629435.1629446
  • Figure 3: TCG IR pipeline
  • Figure 4: riscv-um execution model and data pathway
  • Figure 5: Proposed QEMU engine choice structure
  • ...and 1 more figures