Table of Contents
Fetching ...

Is This the Same Code? A Comprehensive Study of Decompilation Techniques for WebAssembly Binaries

Wei-Cheng Wu, Yutian Yan, Hallgrimur David Egilsson, David Park, Steven Chan, Christophe Hauser, Weihang Wang

TL;DR

This paper presented a novel framework for empirically evaluating C-based decompilers from various aspects including correctness/ readability/ and structural similarity and provided insightful observations regarding the characteristics and constraints of existing decompiled code.

Abstract

WebAssembly is a low-level bytecode language designed for client-side execution in web browsers. The need for decompilation techniques that recover high-level source code from WASM binaries has grown as WASM continues to gain widespread adoption and its security concerns. However little research has been done to assess the quality of decompiled code from WASM. This paper aims to fill this gap by conducting a comprehensive comparative analysis between decompiled C code from WASM binaries and state-of-the-art native binary decompilers. We presented a novel framework for empirically evaluating C-based decompilers from various aspects including correctness/ readability/ and structural similarity. The proposed metrics are validated practicality in decompiler assessment and provided insightful observations regarding the characteristics and constraints of existing decompiled code. This in turn contributes to bolstering the security and reliability of software systems that rely on WASM and native binaries.

Is This the Same Code? A Comprehensive Study of Decompilation Techniques for WebAssembly Binaries

TL;DR

This paper presented a novel framework for empirically evaluating C-based decompilers from various aspects including correctness/ readability/ and structural similarity and provided insightful observations regarding the characteristics and constraints of existing decompiled code.

Abstract

WebAssembly is a low-level bytecode language designed for client-side execution in web browsers. The need for decompilation techniques that recover high-level source code from WASM binaries has grown as WASM continues to gain widespread adoption and its security concerns. However little research has been done to assess the quality of decompiled code from WASM. This paper aims to fill this gap by conducting a comprehensive comparative analysis between decompiled C code from WASM binaries and state-of-the-art native binary decompilers. We presented a novel framework for empirically evaluating C-based decompilers from various aspects including correctness/ readability/ and structural similarity. The proposed metrics are validated practicality in decompiler assessment and provided insightful observations regarding the characteristics and constraints of existing decompiled code. This in turn contributes to bolstering the security and reliability of software systems that rely on WASM and native binaries.

Paper Structure

This paper contains 21 sections, 3 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Overview of methodology workflow
  • Figure 2: An example of AST comparison
  • Figure 3: Readability and structural similarity results consistent with Tables \ref{['tab:readability_results']} and \ref{['tab:similarity_results']}.