Table of Contents
Fetching ...

Flexible Non-intrusive Dynamic Instrumentation for WebAssembly

Ben L. Titzer, Elizabeth Gilbert, Bradley Wei Jie Teo, Yash Anand, Kazuyuki Takayama, Heather Miller

TL;DR

The first non-intrusive dynamic instrumentation system for WebAssembly in the open-source Wizard Research Engine is shown, which offers a flexible, complete hierarchy of instrumentation primitives that support building high-level, complex analyses in terms of low-level, programmable probes.

Abstract

A key strength of managed runtimes over hardware is the ability to gain detailed insight into the dynamic execution of programs with instrumentation. Analyses such as code coverage, execution frequency, tracing, and debugging, are all made easier in a virtual setting. As a portable, low-level bytecode, WebAssembly offers inexpensive in-process sandboxing with high performance. Yet to date, Wasm engines have not offered much insight into executing programs, supporting at best bytecode-level stepping and basic source maps, but no instrumentation capabilities. In this paper, we show the first non-intrusive dynamic instrumentation system for WebAssembly in the open-source Wizard Research Engine. Our innovative design offers a flexible, complete hierarchy of instrumentation primitives that support building high-level, complex analyses in terms of low-level, programmable probes. In contrast to emulation or machine code instrumentation, injecting probes at the bytecode level increases expressiveness and vastly simplifies the implementation by reusing the engine's JIT compiler, interpreter, and deoptimization mechanism rather than building new ones. Wizard supports both dynamic instrumentation insertion and removal while providing consistency guarantees, which is key to composing multiple analyses without interference. We detail a fully-featured implementation in a high-performance multi-tier Wasm engine, show novel optimizations specifically designed to minimize instrumentation overhead, and evaluate performance characteristics under load from various analyses. This design is well-suited for production engine adoption as probes can be implemented to have no impact on production performance when not in use.

Flexible Non-intrusive Dynamic Instrumentation for WebAssembly

TL;DR

The first non-intrusive dynamic instrumentation system for WebAssembly in the open-source Wizard Research Engine is shown, which offers a flexible, complete hierarchy of instrumentation primitives that support building high-level, complex analyses in terms of low-level, programmable probes.

Abstract

A key strength of managed runtimes over hardware is the ability to gain detailed insight into the dynamic execution of programs with instrumentation. Analyses such as code coverage, execution frequency, tracing, and debugging, are all made easier in a virtual setting. As a portable, low-level bytecode, WebAssembly offers inexpensive in-process sandboxing with high performance. Yet to date, Wasm engines have not offered much insight into executing programs, supporting at best bytecode-level stepping and basic source maps, but no instrumentation capabilities. In this paper, we show the first non-intrusive dynamic instrumentation system for WebAssembly in the open-source Wizard Research Engine. Our innovative design offers a flexible, complete hierarchy of instrumentation primitives that support building high-level, complex analyses in terms of low-level, programmable probes. In contrast to emulation or machine code instrumentation, injecting probes at the bytecode level increases expressiveness and vastly simplifies the implementation by reusing the engine's JIT compiler, interpreter, and deoptimization mechanism rather than building new ones. Wizard supports both dynamic instrumentation insertion and removal while providing consistency guarantees, which is key to composing multiple analyses without interference. We detail a fully-featured implementation in a high-performance multi-tier Wasm engine, show novel optimizations specifically designed to minimize instrumentation overhead, and evaluate performance characteristics under load from various analyses. This design is well-suited for production engine adoption as probes can be implemented to have no impact on production performance when not in use.
Paper Structure (62 sections, 7 figures)

This paper contains 62 sections, 7 figures.

Figures (7)

  • Figure 1: Illustration of instrumentation in the interpreter. Global probes can be inserted into the interpreter loop and local probes are implemented via bytecode overwriting. The FrameAccessor API allows a probe programmatic access to the state in the Wasm frame.
  • Figure 2: Code generated by Wizard's baseline JIT for different types of M-code implemented with probes. The machine code sequence for generic probes is more general than for probes that only need the top-of-stack value, versus a fully-intrinsified counter probe.
  • Figure 3: Average relative execution time for the hotness monitor (left) and branch monitor (right), when implemented with local probes and when implemented with a global probe on the PolyBenchC suite. Points above the bars denote number of probe fires.
  • Figure 4: Average relative execution times for the hotness (left) and branch monitors (right), with and without probe intrinsification on the PolyBenchC suite. Ratios are relative to uninstrumented JIT execution time. Points above the bars denote number of probe fires.
  • Figure 5: Execution time decomposition of hotness (left) and branch monitors (right) into M-code and probe dispatch overhead with and without probe intrinsification on the PolyBenchC suite. The cross-hatched regions represent overhead saved by intrinsification.
  • ...and 2 more figures