Table of Contents
Fetching ...

Context-Guided Decompilation: A Step Towards Re-executability

Xiaohan Wang, Yuxin Hu, Kevin Leach

TL;DR

This work tackles the problem of producing re-executable high-level source from binaries, especially when aggressive optimizations obscure semantics. It introduces ICL4Decomp, a hybrid in-context learning framework with two complementary strategies: retrieved-exemplar prompting (ICL4D-R) and optimization-rule prompting (ICLD-O). By combining a large, categorized corpus of assembly-source pairs with category-aware retrieval and rule-based guidance on compiler optimizations, the approach significantly improves re-executability, achieving around a 40% average increase over state-of-the-art baselines across GCC/Clang and O0–O3. The results demonstrate improved structural and semantic reconstruction, robustness to program size and complexity, and practical applicability with focused evaluation on function-level decompilation. Overall, ICL4Decomp advances decompilation from readable code toward reliably executable code without model retraining, enabling more trustworthy binary analysis workflows.

Abstract

Binary decompilation plays an important role in software security analysis, reverse engineering, and malware understanding when source code is unavailable. However, existing decompilation techniques often fail to produce source code that can be successfully recompiled and re-executed, particularly for optimized binaries. Recent advances in large language models (LLMs) have enabled neural approaches to decompilation, but the generated code is typically only semantically plausible rather than truly executable, limiting their practical reliability. These shortcomings arise from compiler optimizations and the loss of semantic cues in compiled code, which LLMs struggle to recover without contextual guidance. To address this challenge, we propose ICL4Decomp, a hybrid decompilation framework that leverages in-context learning (ICL) to guide LLMs toward generating re-executable source code. We evaluate our method across multiple datasets, optimization levels, and compilers, demonstrating around 40\% improvement in re-executability over state-of-the-art decompilation methods while maintaining robustness.

Context-Guided Decompilation: A Step Towards Re-executability

TL;DR

This work tackles the problem of producing re-executable high-level source from binaries, especially when aggressive optimizations obscure semantics. It introduces ICL4Decomp, a hybrid in-context learning framework with two complementary strategies: retrieved-exemplar prompting (ICL4D-R) and optimization-rule prompting (ICLD-O). By combining a large, categorized corpus of assembly-source pairs with category-aware retrieval and rule-based guidance on compiler optimizations, the approach significantly improves re-executability, achieving around a 40% average increase over state-of-the-art baselines across GCC/Clang and O0–O3. The results demonstrate improved structural and semantic reconstruction, robustness to program size and complexity, and practical applicability with focused evaluation on function-level decompilation. Overall, ICL4Decomp advances decompilation from readable code toward reliably executable code without model retraining, enabling more trustworthy binary analysis workflows.

Abstract

Binary decompilation plays an important role in software security analysis, reverse engineering, and malware understanding when source code is unavailable. However, existing decompilation techniques often fail to produce source code that can be successfully recompiled and re-executed, particularly for optimized binaries. Recent advances in large language models (LLMs) have enabled neural approaches to decompilation, but the generated code is typically only semantically plausible rather than truly executable, limiting their practical reliability. These shortcomings arise from compiler optimizations and the loss of semantic cues in compiled code, which LLMs struggle to recover without contextual guidance. To address this challenge, we propose ICL4Decomp, a hybrid decompilation framework that leverages in-context learning (ICL) to guide LLMs toward generating re-executable source code. We evaluate our method across multiple datasets, optimization levels, and compilers, demonstrating around 40\% improvement in re-executability over state-of-the-art decompilation methods while maintaining robustness.

Paper Structure

This paper contains 46 sections, 4 figures, 7 tables, 1 algorithm.

Figures (4)

  • Figure 1: System overview for in-context decompilation.
  • Figure 2: Distribution shift of error categories before and after applying in-context learning.
  • Figure 3: Qualitative example: Ground-truth vs. decompilations from three methods.
  • Figure 4: Re-execution success rate across functions of varying cyclomatic complexity and lines of code for HumanEval-Decompile (top) and ExeBench (bottom).