Table of Contents
Fetching ...

PCodeTrans: Translate Decompiled Pseudocode to Compilable and Executable Equivalent

Yuxin Cui, Zeyu Gao, Shuxian He, Siliang Qin, Chao Zhang

Abstract

Decompilation is foundational to binary analysis, yet conventional tools prioritize human readability over strict recompilability and verifiable runtime correctness. While recent LLM-based approaches attempt to refine decompiled pseudocode, they typically either optimize solely for readability or rely on static analysis for evaluation. This makes them prone to "semantic hallucinations" that compromise accuracy and fail to resolve actual runtime failures. For critical tasks like software modernization and vulnerability remediation, recovered code must not only compile but replicate the original binary's behavior. We present PCodeTrans, a feedback-driven framework that bridges the gap between decompilation, recompilation, and rigorous function-level dynamic validation. After extracting a minimal yet coherent context to guarantee recompilability, PCodeTrans employs an in situ substitutable engine to hot-swap the compiled function directly into the unmodified binary, natively preserving its authentic execution context and global dependencies. Guided by fine-grained differential tracing, PCodeTrans generates precise runtime feedback to iteratively guide an LLM in repairing semantic discrepancies. Evaluated on Coreutils and Binutils, PCodeTrans achieves unprecedented recovery performance when rectifying raw Hex-Rays outputs, attaining 100% function-level compilability on unstripped binaries alongside 99.55% and 99.89% test-validated behavioral consistency, respectively. In doing so, it resolves 76.56% and 79.74% of logic errors exposed by official test suites. Exhibiting exceptional resilience, PCodeTrans maintains over 96% behavioral consistency even on fully stripped binaries. By significantly outperforming all existing baselines, PCodeTrans paves a practical path to reliably translate decompiled pseudocode into compilable and executable equivalents.

PCodeTrans: Translate Decompiled Pseudocode to Compilable and Executable Equivalent

Abstract

Decompilation is foundational to binary analysis, yet conventional tools prioritize human readability over strict recompilability and verifiable runtime correctness. While recent LLM-based approaches attempt to refine decompiled pseudocode, they typically either optimize solely for readability or rely on static analysis for evaluation. This makes them prone to "semantic hallucinations" that compromise accuracy and fail to resolve actual runtime failures. For critical tasks like software modernization and vulnerability remediation, recovered code must not only compile but replicate the original binary's behavior. We present PCodeTrans, a feedback-driven framework that bridges the gap between decompilation, recompilation, and rigorous function-level dynamic validation. After extracting a minimal yet coherent context to guarantee recompilability, PCodeTrans employs an in situ substitutable engine to hot-swap the compiled function directly into the unmodified binary, natively preserving its authentic execution context and global dependencies. Guided by fine-grained differential tracing, PCodeTrans generates precise runtime feedback to iteratively guide an LLM in repairing semantic discrepancies. Evaluated on Coreutils and Binutils, PCodeTrans achieves unprecedented recovery performance when rectifying raw Hex-Rays outputs, attaining 100% function-level compilability on unstripped binaries alongside 99.55% and 99.89% test-validated behavioral consistency, respectively. In doing so, it resolves 76.56% and 79.74% of logic errors exposed by official test suites. Exhibiting exceptional resilience, PCodeTrans maintains over 96% behavioral consistency even on fully stripped binaries. By significantly outperforming all existing baselines, PCodeTrans paves a practical path to reliably translate decompiled pseudocode into compilable and executable equivalents.
Paper Structure (46 sections, 9 figures, 3 tables)

This paper contains 46 sections, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Overview of the PCodeTrans workflow. (1) Compile functions with sufficient context: extract dependencies from the original binary and decompiled pseudocode, applying compiler-guided LLM repair to build compilable functions as dynamic libraries. (2) In-situ substitutable execution: seamlessly execute substitute functions inside the original binary using hook patches for function redirection and patched GOT entries for external symbol relocation. (3) Runtime feedback-guided repair: evaluate the substituted binary with official test suites. Failures trigger sanitizer diagnostics or breakpoint-matched differential tracing, providing fine-grained feedback for the LLM to iteratively repair the function.
  • Figure 2: Construction of the address mapping table. PCodeTrans aligns symbols from the original binary with those in the compiled dynamic module to enable bidirectional runtime relocation.
  • Figure 3: Workflow of breakpoint-matched differential tracing.
  • Figure 4: Runtime bidirectional relocation by the relocation engine.
  • Figure 5: Case studies of runtime feedback-guided repair, including: (a) ASan-guided memory repair, (b) BP-Diff-guided logic repair, and (c) BP-Diff-guided memory layout repair.
  • ...and 4 more figures