Table of Contents
Fetching ...

Augmenting Smart Contract Decompiler Output through Fine-grained Dependency Analysis and LLM-facilitated Semantic Recovery

Zeqin Liao, Yuhong Nan, Zixu Gao, Henglong Liang, Sicheng Hao, Peifan Reng, Zibin Zheng

TL;DR

SmartHalo tackles the challenge that Solidity decompilers lose crucial source-code information by introducing a hybrid framework that fuses static analysis with large language models. It constructs a fine-grained Dependency Graph capturing type, state, and control-flow relations, and uses LLMs guided by these dependencies to refine function boundaries, variable types, and contract attributes, with correctness verification via symbolic execution and SMT-based checks. Evaluations on 456 function pairs show substantial improvements over baselines, with GPT-4o mini further boosting performance and enabling recompilation of the optimized outputs. The work demonstrates practical impact for vulnerability detection and program comprehension in smart contracts and offers a generalizable methodology for SA+LLM-assisted decompiler optimization.

Abstract

Decompiler is a specialized type of reverse engineering tool extensively employed in program analysis tasks, particularly in program comprehension and vulnerability detection. However, current Solidity smart contract decompilers face significant limitations in reconstructing the original source code. In particular, the bottleneck of SOTA decompilers lies in inaccurate method identification, incorrect variable type recovery, and missing contract attributes. These deficiencies hinder downstream tasks and understanding of the program logic. To address these challenges, we propose SmartHalo, a new framework that enhances decompiler output by combining static analysis (SA) and large language models (LLM). SmartHalo leverages the complementary strengths of SA's accuracy in control and data flow analysis and LLM's capability in semantic prediction. More specifically, \system{} constructs a new data structure - Dependency Graph (DG), to extract semantic dependencies via static analysis. Then, it takes DG to create prompts for LLM optimization. Finally, the correctness of LLM outputs is validated through symbolic execution and formal verification. Evaluation on a dataset consisting of 465 randomly selected smart contract methods shows that SmartHalo significantly improves the quality of the decompiled code, compared to SOTA decompilers (e.g., Gigahorse). Notably, integrating GPT-4o with SmartHalo further enhances its performance, achieving precision rates of 87.39% for method boundaries, 90.39% for variable types, and 80.65% for contract attributes.

Augmenting Smart Contract Decompiler Output through Fine-grained Dependency Analysis and LLM-facilitated Semantic Recovery

TL;DR

SmartHalo tackles the challenge that Solidity decompilers lose crucial source-code information by introducing a hybrid framework that fuses static analysis with large language models. It constructs a fine-grained Dependency Graph capturing type, state, and control-flow relations, and uses LLMs guided by these dependencies to refine function boundaries, variable types, and contract attributes, with correctness verification via symbolic execution and SMT-based checks. Evaluations on 456 function pairs show substantial improvements over baselines, with GPT-4o mini further boosting performance and enabling recompilation of the optimized outputs. The work demonstrates practical impact for vulnerability detection and program comprehension in smart contracts and offers a generalizable methodology for SA+LLM-assisted decompiler optimization.

Abstract

Decompiler is a specialized type of reverse engineering tool extensively employed in program analysis tasks, particularly in program comprehension and vulnerability detection. However, current Solidity smart contract decompilers face significant limitations in reconstructing the original source code. In particular, the bottleneck of SOTA decompilers lies in inaccurate method identification, incorrect variable type recovery, and missing contract attributes. These deficiencies hinder downstream tasks and understanding of the program logic. To address these challenges, we propose SmartHalo, a new framework that enhances decompiler output by combining static analysis (SA) and large language models (LLM). SmartHalo leverages the complementary strengths of SA's accuracy in control and data flow analysis and LLM's capability in semantic prediction. More specifically, \system{} constructs a new data structure - Dependency Graph (DG), to extract semantic dependencies via static analysis. Then, it takes DG to create prompts for LLM optimization. Finally, the correctness of LLM outputs is validated through symbolic execution and formal verification. Evaluation on a dataset consisting of 465 randomly selected smart contract methods shows that SmartHalo significantly improves the quality of the decompiled code, compared to SOTA decompilers (e.g., Gigahorse). Notably, integrating GPT-4o with SmartHalo further enhances its performance, achieving precision rates of 87.39% for method boundaries, 90.39% for variable types, and 80.65% for contract attributes.
Paper Structure (21 sections, 4 equations, 10 figures, 11 tables)

This paper contains 21 sections, 4 equations, 10 figures, 11 tables.

Figures (10)

  • Figure 1: Three motivating examples for illustrating the limitations of current decompiler output.
  • Figure 2: The decompiled code optimized by LLM for the instance in Fig. \ref{['motivatingexample']}(b).
  • Figure 3: An optimization error reported by LLM inference in terms of program-behavior non-equivalence.
  • Figure 4: The overview of SmartHalo.
  • Figure 5: The syntax of expression for typing in Solidity.
  • ...and 5 more figures