Forklift: An Extensible Neural Lifter
Jordi Armengol-Estapé, Rodrigo C. O. Rocha, Jackson Woodruff, Pasquale Minervini, Michael F. P. O'Boyle
TL;DR
Forklift tackles the problem of porting binary code across diverse ISAs by learning to lift assembly directly to LLVM IR, an IR that can be compiled to many target architectures. The authors propose an extensible, incremental learning framework that uses a fixed LLVM IR decoder and per-ISA encoders, enabling new ISAs to be added with minimal retraining. They train on a million-scale parallel dataset of LLVM IR and assembly across x86, ARM, and RISC‑V, with an IO-based accuracy harness to evaluate translations. Empirically, Forklift outperforms a state-of-the-art hand-written lifter and GPT-4 on two benchmarks, and demonstrates superior scalability and adaptability to new compilers and ISAs, highlighting practical viability for cross-ISA software porting and optimization workflows.
Abstract
The escalating demand to migrate legacy software across different Instruction Set Architectures (ISAs) has driven the development of assembly-to-assembly translators to map between their respective assembly languages. However, the development of these tools requires substantial engineering effort. State-of-the-art approaches use lifting, a technique where source assembly code is translated to an architecture-independent intermediate representation (IR) (for example, the LLVM IR) and use a pre-existing compiler to recompile the IR to the target ISA. However, the hand-written rules these lifters employ are sensitive to the particular compiler and optimization level used to generate the code and require significant engineering effort to support each new ISA. We propose Forklift, the first neural lifter that learns how to translate assembly to LLVM IR using a token-level encoder-decoder Transformer. We show how to incrementally add support to new ISAs by fine tuning the assembly encoder and freezing the IR decoder, improving the overall accuracy and efficiency. We collect millions of parallel LLVM IR, x86, ARM, and RISC-V programs across compilers and optimization levels to train Forklift and set up an input/output-based accuracy harness. We evaluate Forklift on two challenging benchmark suites and translate 2.5x more x86 programs than a state-of-the-art hand-written lifter and 4.4x more x86 programs than GPT-4 as well as enabling translation from new ISAs.
