LeanBin: Harnessing Lifting and Recompilation to Debloat Binaries
Igor Wodiany, Antoniu Pop, Mikel Luján
TL;DR
LeanBin tackles binary debloating by fusing dynamic execution tracing with heuristic-free static CFG analysis to produce a structured, liftable representation that can be recompilable into a debloated binary and LLVM IR. The approach achieves a favorable balance between precision and performance, enabling debloating of applications and shared libraries on AArch64 with modest run-time overhead and substantial reductions in gadgets and code size. Across SPEC CPU2006 INT benchmarks and a SQLite-based use case, LeanBin demonstrates reductions in attack surface and improvements in speed for specialized builds, including multi-threaded and cross-library scenarios. The work advances practical binary specialization by integrating runtime traces with guaranteed-safe static augmentation and delivering a reusable compilation pipeline, with open-source availability for broader adoption.
Abstract
To reduce the source of potential exploits, binary debloating or specialization tools are used to remove unnecessary code from binaries. This paper presents a new binary debloating and specialization tool, LeanBin, that harnesses lifting and recompilation, based on observed execution traces. The dynamically recorded execution traces capture the required subset of instructions and control flow of the application binary for a given set of inputs. This initial control flow is subsequently augmented using heuristic-free static analysis to avoid excessively restricting the input space. The further structuring of the control flow and translation of binary instructions into a subset of C enables a lightweight generation of the code that can be recompiled, obtaining LLVM IR and a new debloated binary. Unlike most debloating approaches, LeanBin enables both binary debloating of the application and shared libraries, while reusing the existing compiler infrastructure. Additionally, unlike existing binary lifters, it does not rely on potentially unsound heuristics used by static lifters, nor suffers from long execution times, a limitation of existing dynamic lifters. Instead, LeanBin combines both heuristic-free static and dynamic analysis. The run time of lifting and debloating SPEC CPU2006 INT benchmarks has a geomean of 1.78$\times$, normalized to the native execution, and the debloated binary runs with a geomean overhead of 1.21$\times$. The percentage of gadgets, compared to the original binary, has a geomean between 24.10% and 30.22%, depending on the debloating strategy; and the code size can be as low as 53.59%. For the SQLite use-case, LeanBin debloats a binary including its shared library and generates a debloated binary that runs up to 1.24$\times$ faster with 3.65% gadgets.
