Table of Contents
Fetching ...

Building Reuse-Sensitive Control Flow Graphs (CFGs) for EVM Bytecode

Dingding Wang, Jianting He, Yizheng Yang, Lei Wu, Rui Chang, Yajin Zhou

TL;DR

The paper tackles the problem of static analysis on EVM bytecode being hampered by compiler-induced code reuse, which creates semantic ambiguities and redundant control-flow dependencies in conventional CFGs. It introduces Esuer, a tool that dynamically identifies code reuse through taint-driven reuse contexts during iterative CFG construction to produce reuse-sensitive CFGs. Esuer identifies eight reuse patterns, provides a three-step CFG recovery design (snapshot, taint-based context update, and cloning-based edge resolution), and demonstrates superior precision, speed, and downstream effectiveness versus six state-of-the-art tools on 10,000 contracts, including robust vulnerability detection for tx.origin and reentrancy. The approach significantly improves CFG fidelity and static-analysis reliability for smart contracts, enabling more accurate vulnerability detection and faster analysis at scale.

Abstract

The emergence of smart contracts brings security risks, exposing users to the threat of losing valuable cryptocurrencies, underscoring the urgency of meticulous scrutiny. Nevertheless, the static analysis of smart contracts in EVM bytecode faces obstacles due to flawed primitives resulting from code reuse introduced by compilers. Code reuse, a phenomenon where identical code executes in diverse contexts, engenders semantic ambiguities and redundant control-flow dependencies within reuse-insensitive CFGs. This work delves into the exploration of code reuse within EVM bytecode, outlining prevalent reuse patterns, and introducing Esuer, a tool that dynamically identifies code reuse when constructing CFGs. Leveraging taint analysis to dynamically identify reuse contexts, Esuer identifies code reuse by comparing multiple contexts for a basic block and replicates reused code for a reuse-sensitive CFG. Evaluation involving 10,000 prevalent smart contracts, compared with six leading tools, demonstrates Esuer's ability to notably refine CFG precision. It achieves an execution trace coverage of 99.94% and an F1-score of 97.02% for accurate identification of reused code. Furthermore, Esuer attains a success rate of 99.25%, with an average execution time of 1.06 seconds, outpacing tools generating reuse-insensitive CFGs. Esuer's efficacy in assisting identifying vulnerabilities such as tx.origin and reentrancy vulnerabilities, achieving F1-scores of 99.97% and 99.67%, respectively.

Building Reuse-Sensitive Control Flow Graphs (CFGs) for EVM Bytecode

TL;DR

The paper tackles the problem of static analysis on EVM bytecode being hampered by compiler-induced code reuse, which creates semantic ambiguities and redundant control-flow dependencies in conventional CFGs. It introduces Esuer, a tool that dynamically identifies code reuse through taint-driven reuse contexts during iterative CFG construction to produce reuse-sensitive CFGs. Esuer identifies eight reuse patterns, provides a three-step CFG recovery design (snapshot, taint-based context update, and cloning-based edge resolution), and demonstrates superior precision, speed, and downstream effectiveness versus six state-of-the-art tools on 10,000 contracts, including robust vulnerability detection for tx.origin and reentrancy. The approach significantly improves CFG fidelity and static-analysis reliability for smart contracts, enabling more accurate vulnerability detection and faster analysis at scale.

Abstract

The emergence of smart contracts brings security risks, exposing users to the threat of losing valuable cryptocurrencies, underscoring the urgency of meticulous scrutiny. Nevertheless, the static analysis of smart contracts in EVM bytecode faces obstacles due to flawed primitives resulting from code reuse introduced by compilers. Code reuse, a phenomenon where identical code executes in diverse contexts, engenders semantic ambiguities and redundant control-flow dependencies within reuse-insensitive CFGs. This work delves into the exploration of code reuse within EVM bytecode, outlining prevalent reuse patterns, and introducing Esuer, a tool that dynamically identifies code reuse when constructing CFGs. Leveraging taint analysis to dynamically identify reuse contexts, Esuer identifies code reuse by comparing multiple contexts for a basic block and replicates reused code for a reuse-sensitive CFG. Evaluation involving 10,000 prevalent smart contracts, compared with six leading tools, demonstrates Esuer's ability to notably refine CFG precision. It achieves an execution trace coverage of 99.94% and an F1-score of 97.02% for accurate identification of reused code. Furthermore, Esuer attains a success rate of 99.25%, with an average execution time of 1.06 seconds, outpacing tools generating reuse-insensitive CFGs. Esuer's efficacy in assisting identifying vulnerabilities such as tx.origin and reentrancy vulnerabilities, achieving F1-scores of 99.97% and 99.67%, respectively.

Paper Structure

This paper contains 26 sections, 10 figures, 3 tables, 1 algorithm.

Figures (10)

  • Figure 1: Reuse-sensitive and reuse-insensitive CFGs and data analysis for the example. The reused BB is in gray background.
  • Figure 2: Two different jump patterns in EVM bytecode.
  • Figure 3: Reuse patterns without real control-flow structures.
  • Figure 4: Reuse patterns with real control-flow structures.
  • Figure 5: High-level architecture of Esuer.
  • ...and 5 more figures