Table of Contents
Fetching ...

Fast, Fine-Grained Equivalence Checking for Neural Decompilers

Luke Dramko, Claire Le Goues, Edward J. Schwartz

TL;DR

Codealign introduces instruction-level equivalence alignments to evaluate neural decompilers, addressing the scarcity of reliable, fine-grained correctness metrics. It defines a formal framework based on SSA, value binding, and functional equivalence, then builds an inductive proof engine that uses data- and control-flow lemmas to align instructions across two functions. The approach is sound but incomplete and is validated against symbolic execution while delivering fast, scalable runtime performance; it additionally enables evaluation of variable-name quality via mapped correspondences. The method shows practical utility in assessing neural decompilers, with clear benefits for debugging hallucinations and guiding downstream tasks like patching, cloning, and plagiarism detection. Overall, codealign provides a rigorous, scalable, and actionable framework for fine-grained program equivalence in the neural decompilation setting.

Abstract

Neural decompilers are machine learning models that reconstruct the source code from an executable program. Critical to the lifecycle of any machine learning model is an evaluation of its effectiveness. However, existing techniques for evaluating neural decompilation models have substantial weaknesses, especially when it comes to showing the correctness of the neural decompiler's predictions. To address this, we introduce codealign, a novel instruction-level code equivalence technique designed for neural decompilers. We provide a formal definition of a relation between equivalent instructions, which we term an equivalence alignment. We show how codealign generates equivalence alignments, then evaluate codealign by comparing it with symbolic execution. Finally, we show how the information codealign provides-which parts of the functions are equivalent and how well the variable names match-is substantially more detailed than existing state-of-the-art evaluation metrics, which report unitless numbers measuring similarity.

Fast, Fine-Grained Equivalence Checking for Neural Decompilers

TL;DR

Codealign introduces instruction-level equivalence alignments to evaluate neural decompilers, addressing the scarcity of reliable, fine-grained correctness metrics. It defines a formal framework based on SSA, value binding, and functional equivalence, then builds an inductive proof engine that uses data- and control-flow lemmas to align instructions across two functions. The approach is sound but incomplete and is validated against symbolic execution while delivering fast, scalable runtime performance; it additionally enables evaluation of variable-name quality via mapped correspondences. The method shows practical utility in assessing neural decompilers, with clear benefits for debugging hallucinations and guiding downstream tasks like patching, cloning, and plagiarism detection. Overall, codealign provides a rigorous, scalable, and actionable framework for fine-grained program equivalence in the neural decompilation setting.

Abstract

Neural decompilers are machine learning models that reconstruct the source code from an executable program. Critical to the lifecycle of any machine learning model is an evaluation of its effectiveness. However, existing techniques for evaluating neural decompilation models have substantial weaknesses, especially when it comes to showing the correctness of the neural decompiler's predictions. To address this, we introduce codealign, a novel instruction-level code equivalence technique designed for neural decompilers. We provide a formal definition of a relation between equivalent instructions, which we term an equivalence alignment. We show how codealign generates equivalence alignments, then evaluate codealign by comparing it with symbolic execution. Finally, we show how the information codealign provides-which parts of the functions are equivalent and how well the variable names match-is substantially more detailed than existing state-of-the-art evaluation metrics, which report unitless numbers measuring similarity.
Paper Structure (28 sections, 9 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 28 sections, 9 equations, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: A prediction by a machine learning model (left) and the reference solution (right). Some instructions in the prediction are equivalent to the reference; these are connected with boxes and lines. However, the prediction is subtly incorrect; it always returns the value of evaluating on the last item of the list, rather than the highest one found. Equivalent instructions are connected with boxes and lines.
  • Figure 2: An illustration of how codealign works. Fig. \ref{['fig:inference_a']} shows two examples (derived from the data in Section \ref{['sec:neuraldecompilation']}) and their canonicalized SSA representations. Fig. \ref{['fig:inference_d']} shows how, using pairs of SSA-representation edges as lemmas, various values in the examples can be proven equivalent.
  • Figure 3: A CFG (left) and CDG (right). There is a cycle of control dependencies between blocks 5 and 3. The red dashed line indicates a back-dependency, as does the CDG edge from 1 to 5.
  • Figure 4: Runtime performance of codealign on functions from GNU Coreutils with a quadratic regression trend line.
  • Figure 5: A function (Fig. \ref{['fig:original']}) from the test set in the experiment. Fig. \ref{['fig:decompiled']} shows the same function having been compiled, then decompiled. Machine learning models can be used to render decompiler output more readable, but they may produce semantically nonequivalent code. Fig. \ref{['fig:predicted']} shows an example. The alignment generated by codealign in Fig. \ref{['fig:ndalignmentexample']} can be used to both detect the hallucination, and evaluate the quality of the variable names in the decompiled function. Fig. \ref{['fig:ndalignmentexample']}, shoes the use of partial-loop (Section \ref{['sec:options']}) alignment to analyze why the loop does not align.
  • ...and 1 more figures