Reusing Legacy Code in WebAssembly: Key Challenges of Cross-Compilation and Code Semantics Preservation
Sara Baradaran, Liyan Huang, Mukund Raghothaman, Weihang Wang
TL;DR
This work tackles the challenge of reusing legacy C/C++ code by cross-compiling to WebAssembly and preserving code semantics. It conducts a two-phase study: first identifying practical build-time challenges when porting code to Wasm, then evaluating semantic fidelity using a differential testing framework. The authors introduce WasmChecker, which leverages CodeQL-based static analysis, automated build transformations, and VFS file preloading to align WebAssembly builds with native binaries, and they validate semantic differences across 135 projects with 34,480 tests, uncovering 226 discrepancies (with 220 true and 6 false positives) and 11 new Emscripten bugs. The work provides a valuable dataset and open-source tooling to improve cross-compilation reliability and informs future improvements in Wasm compiler and runtime ecosystems. The findings highlight that WebAssembly trades some code reuse ease for portability and performance, and that careful configuration and code adjustments are often required to achieve faithful semantics across native and Wasm binaries.
Abstract
WebAssembly (Wasm) has emerged as a powerful technology for executing high-performance code and reusing legacy code in web browsers. With its increasing adoption, ensuring the reliability of WebAssembly code becomes paramount. In this paper, we investigate how well WebAssembly compilers fulfill code reusability. Specifically, we inquire (1) what challenges arise when cross-compiling a high-level language codebase into WebAssembly and (2) how faithfully WebAssembly compilers preserve code semantics in this new binary. Through a study on 115 open-source codebases, we identify the key challenges in cross-compiling legacy C/C++ code into WebAssembly, highlighting the risks of silent miscompilation and compile-time errors. We categorize these challenges based on their root causes and propose corresponding solutions. We then introduce a differential testing approach, implemented in a framework named WasmChecker, to investigate the semantics equivalency of code between native x86-64 and WebAssembly binaries. Using WasmChecker, we provide a witness that WebAssembly compilers do not necessarily preserve code semantics when cross-compiling high-level language code into WebAssembly due to different implementations of standard libraries, unsupported system calls/APIs, WebAssembly's unique features, and compiler bugs. Furthermore, we have identified 11 new bugs in the Emscripten compiler toolchain, all confirmed by Emscripten developers. As proof of concept, we make our framework and the collected dataset of open-source codebases publicly available.
