Wasm-R3: Record-Reduce-Replay for Realistic and Standalone WebAssembly Benchmarks
Doehyun Baek, Jakob Getz, Yusung Sim, Daniel Lehmann, Ben L. Titzer, Sukyoung Ryu, Michael Pradel
TL;DR
Wasm-R3 introduces the first record-replay framework for WebAssembly, designed to produce realistic and standalone benchmarks by instrumenting Wasm modules to record host interactions, reducing the resulting traces, and replaying them as self-contained artifacts. The method separates recording from replay via a replay IR and applies optimization passes to minimize trace and code size while preserving fidelity to the original execution. Across 27 real-world Wasm applications, Wasm-R3 achieves accurate replay benchmarks that run on multiple engines, with record overhead typically under 8× and replay code contributing a negligible fraction of execution time. The work provides a practical benchmark suite, Wasm-R3-Bench, and open-source tooling to accelerate realistic Wasm performance tuning and cross-engine benchmarking.
Abstract
WebAssembly (Wasm for short) brings a new, powerful capability to the web as well as Edge, IoT, and embedded systems. Wasm is a portable, compact binary code format with high performance and robust sandboxing properties. As Wasm applications grow in size and importance, the complex performance characteristics of diverse Wasm engines demand robust, representative benchmarks for proper tuning. Stopgap benchmark suites, such as PolyBenchC and libsodium, continue to be used in the literature, though they are known to be unrepresentative. Porting of more complex suites remains difficult because Wasm lacks many system APIs and extracting real-world Wasm benchmarks from the web is difficult due to complex host interactions. To address this challenge, we introduce Wasm-R3, the first record and replay technique for Wasm. Wasm-R3 transparently injects instrumentation into Wasm modules to record an execution trace from inside the module, then reduces the execution trace via several optimizations, and finally produces a replay module that is executable sandalone without any host environment - on any engine. The benchmarks created by our approach are (i) realistic, because the approach records real-world web applications, (ii) faithful to the original execution, because the replay benchmark includes the unmodified original code, only adding emulation of host interactions, and (iii) standalone, because the replay benchmarks run on any engine. Applying Wasm-R3 to web-based Wasm applications in the wild demonstrates the correctness of our approach as well as the effectiveness of our optimizations, which reduce the recorded traces by 99.53 percent and the size of the replay benchmark by 9.98 percent. We release the resulting benchmark suite of 27 applications, called Wasm-R3-Bench, to the community, to inspire a new generation of realistic and standalone Wasm benchmarks.
