Table of Contents
Fetching ...

The Incredible Shrinking Context... in a Decompiler Near You

Sifis Lagouvardos, Yannis Bollanos, Neville Grech, Yannis Smaragdakis

TL;DR

This paper tackles the challenging problem of recovering high-level control-flow from EVM bytecode, focusing on static-analysis-based decompilation. It introduces Shrnkr, which centers on shrinking context sensitivity—an enhanced context abstraction that aggressively trims private-context history while preserving crucial call/return information—alongside block cloning and an incomplete global pre-analysis. The evaluation shows Shrnkr outperforming the prior state-of-the-art Elipmoc in scalability, precision, and completeness, and outperforming the symbolic Heimdall-rs in end-to-end coverage on large real-world contracts, including a dedicated hack-contract case study. The work delivers a practical, open-source decompiler that scales to modern Solidity/Yul pipelines and provides stronger guarantees for automated analyses and tooling around smart contracts.

Abstract

Decompilation of binary code has arisen as a highly-important application in the space of Ethereum VM (EVM) smart contracts. Major new decompilers appear nearly every year and attain popularity, for a multitude of reverse-engineering or tool-building purposes. Technically, the problem is fundamental: it consists of recovering high-level control flow from a highly-optimized continuation-passing-style (CPS) representation. Architecturally, decompilers can be built using either static analysis or symbolic execution techniques. We present Shrknr, a static-analysis-based decompiler succeeding the state-of-the-art Elipmoc decompiler. Shrknr manages to achieve drastic improvements relative to the state of the art, in all significant dimensions: scalability, completeness, precision. Chief among the techniques employed is a new variant of static analysis context: shrinking context sensitivity. Shrinking context sensitivity performs deep cuts in the static analysis context, eagerly "forgetting" control-flow history, in order to leave room for further precise reasoning. We compare Shrnkr to state-of-the-art decompilers, both static-analysis- and symbolic-execution-based. In a standard benchmark set, Shrnkr scales to over 99.5% of contracts (compared to ~95%), covers (i.e., reaches and manages to decompile) 67% more code, and reduces key imprecision metrics by over 65%.

The Incredible Shrinking Context... in a Decompiler Near You

TL;DR

This paper tackles the challenging problem of recovering high-level control-flow from EVM bytecode, focusing on static-analysis-based decompilation. It introduces Shrnkr, which centers on shrinking context sensitivity—an enhanced context abstraction that aggressively trims private-context history while preserving crucial call/return information—alongside block cloning and an incomplete global pre-analysis. The evaluation shows Shrnkr outperforming the prior state-of-the-art Elipmoc in scalability, precision, and completeness, and outperforming the symbolic Heimdall-rs in end-to-end coverage on large real-world contracts, including a dedicated hack-contract case study. The work delivers a practical, open-source decompiler that scales to modern Solidity/Yul pipelines and provides stronger guarantees for automated analyses and tooling around smart contracts.

Abstract

Decompilation of binary code has arisen as a highly-important application in the space of Ethereum VM (EVM) smart contracts. Major new decompilers appear nearly every year and attain popularity, for a multitude of reverse-engineering or tool-building purposes. Technically, the problem is fundamental: it consists of recovering high-level control flow from a highly-optimized continuation-passing-style (CPS) representation. Architecturally, decompilers can be built using either static analysis or symbolic execution techniques. We present Shrknr, a static-analysis-based decompiler succeeding the state-of-the-art Elipmoc decompiler. Shrknr manages to achieve drastic improvements relative to the state of the art, in all significant dimensions: scalability, completeness, precision. Chief among the techniques employed is a new variant of static analysis context: shrinking context sensitivity. Shrinking context sensitivity performs deep cuts in the static analysis context, eagerly "forgetting" control-flow history, in order to leave room for further precise reasoning. We compare Shrnkr to state-of-the-art decompilers, both static-analysis- and symbolic-execution-based. In a standard benchmark set, Shrnkr scales to over 99.5% of contracts (compared to ~95%), covers (i.e., reaches and manages to decompile) 67% more code, and reduces key imprecision metrics by over 65%.
Paper Structure (40 sections, 13 figures, 8 tables)

This paper contains 40 sections, 13 figures, 8 tables.

Figures (13)

  • Figure 1: Simple smart contract, used as running example.
  • Figure 2: Function selector logic for our example in Figure \ref{['fig:solidity-example']}.
  • Figure 3: Private Function Call Patterns
  • Figure 4: Example: Shrinking context sensitivity contrasted (at each analyzed block) to the Transactional context sensitivity of past work. Both context sensitivity algorithms have a maximum context depth of 4. The public function components of both algorithms are omitted because they remain unchanged in the transitions shown. Arrows to the right are calls, arrows to the left returns. The analysis has initial information that should be kept precisely through the analyzed sub-graph: continuationA is applicable (e.g., it is kept in a certain stack location) if we reach the first analyzed block (0x1ca) with context 0xa. Transactional context sensitivity forgets this information by the time it analyzes the last block: the context is merely the blocks shown in the figure, with no trace of how the analysis got to them. In contrast, shrinking context sensitivity maintains the information: the context shown at the last block captures how we got to the first block.
  • Figure 5: Context constructor for shrinking context sensitivity. For ease of exposition, we use labeled records to distinguish the public part of the context (single element) from the private part (of $n$ elements), instead of merging both in a flat tuple of $n+1$ elements.
  • ...and 8 more figures