Table of Contents
Fetching ...

Precise Static Identification of Ethereum Storage Variables (Extended Version)

Sifis Lagouvardos, Yannis Bollanos, Michael Debono, Neville Grech, Yannis Smaragdakis

TL;DR

The paper tackles the challenge of reconstructing high-level on-chain storage structures from Ethereum bytecode, a task hampered by dynamic storage slot computations and loss of source-level information. It introduces dyels, a recursive static analysis that overapproximates storage index values, identifies storage constructs, and infers value types, including arbitrarily nested mappings and dynamic arrays, with scalable performance. On a large, real-world dataset, dyels achieves high precision (around 95-98%) and strong recall (around 90-95%), significantly outperforming the state-of-the-art VarLifter, and even reveals storage patterns not exposed by compiler metadata. The approach yields practical benefits for security analysis, decompilation, and tooling, demonstrated by reentrancy guard detection and broad deployment across the Ethereum mainnet.

Abstract

Smart contracts are small programs that run autonomously on the blockchain, using it as their persistent memory. The predominant platform for smart contracts is the Ethereum VM (EVM). In EVM smart contracts, a problem with significant applications is to identify data structures (in blockchain state, a.k.a. "storage"), given only the deployed smart contract code. The problem has been highly challenging and has often been considered nearly impossible to address satisfactorily. (For reference, the latest state-of-the-art research tool fails to recover nearly all complex data structures and scales to under 50% of contracts.) Much of the complication is that the main on-chain data structures (mappings and arrays) have their locations derived dynamically through code execution. We propose sophisticated static analysis techniques to solve the identification of on-chain data structures with extremely high fidelity and completeness. Our analysis scales nearly universally and recovers deep data structures. Our techniques are able to identify the exact types of data structures with 98.6% precision and at least 92.6% recall, compared to a state-of-the-art tool managing 80.8% and 68.2% respectively. Strikingly, the analysis is often more complete than the storage description that the compiler itself produces, with full access to the source code.

Precise Static Identification of Ethereum Storage Variables (Extended Version)

TL;DR

The paper tackles the challenge of reconstructing high-level on-chain storage structures from Ethereum bytecode, a task hampered by dynamic storage slot computations and loss of source-level information. It introduces dyels, a recursive static analysis that overapproximates storage index values, identifies storage constructs, and infers value types, including arbitrarily nested mappings and dynamic arrays, with scalable performance. On a large, real-world dataset, dyels achieves high precision (around 95-98%) and strong recall (around 90-95%), significantly outperforming the state-of-the-art VarLifter, and even reveals storage patterns not exposed by compiler metadata. The approach yields practical benefits for security analysis, decompilation, and tooling, demonstrated by reentrancy guard detection and broad deployment across the Ethereum mainnet.

Abstract

Smart contracts are small programs that run autonomously on the blockchain, using it as their persistent memory. The predominant platform for smart contracts is the Ethereum VM (EVM). In EVM smart contracts, a problem with significant applications is to identify data structures (in blockchain state, a.k.a. "storage"), given only the deployed smart contract code. The problem has been highly challenging and has often been considered nearly impossible to address satisfactorily. (For reference, the latest state-of-the-art research tool fails to recover nearly all complex data structures and scales to under 50% of contracts.) Much of the complication is that the main on-chain data structures (mappings and arrays) have their locations derived dynamically through code execution. We propose sophisticated static analysis techniques to solve the identification of on-chain data structures with extremely high fidelity and completeness. Our analysis scales nearly universally and recovers deep data structures. Our techniques are able to identify the exact types of data structures with 98.6% precision and at least 92.6% recall, compared to a state-of-the-art tool managing 80.8% and 68.2% respectively. Strikingly, the analysis is often more complete than the storage description that the compiler itself produces, with full access to the source code.

Paper Structure

This paper contains 32 sections, 26 equations, 16 figures, 7 tables.

Figures (16)

  • Figure 1: Example Smart Contract
  • Figure 2: Low-level Storage Layout Implementation of our example in Figure \ref{['example']}
  • Figure 3: Low-level code implementing the ERC-1967 standard.
  • Figure 4: Type domain definitions
  • Figure 5: Input relation definitions
  • ...and 11 more figures