Table of Contents
Fetching ...

ReCopilot: Reverse Engineering Copilot in Binary Analysis

Guoqiang Chen, Huiqi Sun, Daguang Liu, Zhiqi Wang, Qiang Wang, Bin Yin, Lu Liu, Lingyun Ying

TL;DR

ReCopilot tackles the challenge of binary analysis under symbol stripping by introducing a domain‑expert LLM trained in three stages (continue pretraining, supervised fine‑tuning, direct preference optimization) and enhanced with static context from call graphs and data flow. It builds large-scale domain data (60B pretraining tokens, 36B pretraining corpus, 403K SFT examples, 2.4K DPO samples) and uses a generator–discriminator workflow to produce reasoning‑rich CoTs, augmented by test‑time scaling. A multi‑task benchmark and evaluation on function name recovery, variable name/type inference, and binary code summarization show state‑of‑the‑art performance (average ~13% improvements) with a compact 7B model, though challenges remain in long chain‑of‑thought reasoning and format reliability. The work demonstrates the viability of domain‑specific training and context enhancement for scalable, interpretable AI assistance in binary analysis and points toward future improvements in RL, scaling, and agentic tool use for security analysts.

Abstract

Binary analysis plays a pivotal role in security domains such as malware detection and vulnerability discovery, yet it remains labor-intensive and heavily reliant on expert knowledge. General-purpose large language models (LLMs) perform well in programming analysis on source code, while binaryspecific LLMs are underexplored. In this work, we present ReCopilot, an expert LLM designed for binary analysis tasks. ReCopilot integrates binary code knowledge through a meticulously constructed dataset, encompassing continue pretraining (CPT), supervised fine-tuning (SFT), and direct preference optimization (DPO) stages. It leverages variable data flow and call graph to enhance context awareness and employs test-time scaling to improve reasoning capabilities. Evaluations on a comprehensive binary analysis benchmark demonstrate that ReCopilot achieves state-of-the-art performance in tasks such as function name recovery and variable type inference on the decompiled pseudo code, outperforming both existing tools and LLMs by 13%. Our findings highlight the effectiveness of domain-specific training and context enhancement, while also revealing challenges in building super long chain-of-thought. ReCopilot represents a significant step toward automating binary analysis with interpretable and scalable AI assistance in this domain.

ReCopilot: Reverse Engineering Copilot in Binary Analysis

TL;DR

ReCopilot tackles the challenge of binary analysis under symbol stripping by introducing a domain‑expert LLM trained in three stages (continue pretraining, supervised fine‑tuning, direct preference optimization) and enhanced with static context from call graphs and data flow. It builds large-scale domain data (60B pretraining tokens, 36B pretraining corpus, 403K SFT examples, 2.4K DPO samples) and uses a generator–discriminator workflow to produce reasoning‑rich CoTs, augmented by test‑time scaling. A multi‑task benchmark and evaluation on function name recovery, variable name/type inference, and binary code summarization show state‑of‑the‑art performance (average ~13% improvements) with a compact 7B model, though challenges remain in long chain‑of‑thought reasoning and format reliability. The work demonstrates the viability of domain‑specific training and context enhancement for scalable, interpretable AI assistance in binary analysis and points toward future improvements in RL, scaling, and agentic tool use for security analysts.

Abstract

Binary analysis plays a pivotal role in security domains such as malware detection and vulnerability discovery, yet it remains labor-intensive and heavily reliant on expert knowledge. General-purpose large language models (LLMs) perform well in programming analysis on source code, while binaryspecific LLMs are underexplored. In this work, we present ReCopilot, an expert LLM designed for binary analysis tasks. ReCopilot integrates binary code knowledge through a meticulously constructed dataset, encompassing continue pretraining (CPT), supervised fine-tuning (SFT), and direct preference optimization (DPO) stages. It leverages variable data flow and call graph to enhance context awareness and employs test-time scaling to improve reasoning capabilities. Evaluations on a comprehensive binary analysis benchmark demonstrate that ReCopilot achieves state-of-the-art performance in tasks such as function name recovery and variable type inference on the decompiled pseudo code, outperforming both existing tools and LLMs by 13%. Our findings highlight the effectiveness of domain-specific training and context enhancement, while also revealing challenges in building super long chain-of-thought. ReCopilot represents a significant step toward automating binary analysis with interpretable and scalable AI assistance in this domain.

Paper Structure

This paper contains 20 sections, 2 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Pseudo Code
  • Figure 2: Source Code
  • Figure 4: An overview of the ReCopilot model building.
  • Figure 5: An example for demonstrating the pretraining data format and inner-shuffling.
  • Figure 6: The input-output template designed for binary analysis tasks in ReCopilot.
  • ...and 4 more figures