FoC: Figure out the Cryptographic Functions in Stripped Binaries with LLMs
Xiuwei Shang, Guoqiang Chen, Shaoyin Cheng, Shikai Guo, Yanming Zhang, Weiming Zhang, Nenghai Yu
TL;DR
FoC addresses the challenge of understanding cryptographic functions in stripped binaries by introducing FoC-BinLLM, an LLM-based binary code semantic summarizer initialized from a golden CodeT5+-based model, and FoC-Sim, a binary code similarity engine that fuses semantic encodings, control-flow structure, and cryptographic features. The framework is paired with a comprehensive cryptographic binary dataset and automated semantic labeling via a keyword-based discriminator, enabling robust cross-version awareness and retrieval of homologous implementations. Empirically, FoC-BinLLM outperforms ChatGPT on ROUGE-L by $14.61\%$, while FoC-Sim achieves $52\%$ higher Recall@1 than prior BCSD methods, and practical results demonstrate utility in virus analysis and firmware vulnerability detection. The work provides a public dataset and a scalable, domain-specific approach for cryptographic binary analysis with strong potential for real-world security applications.
Abstract
Analyzing the behavior of cryptographic functions in stripped binaries is a challenging but essential task. Cryptographic algorithms exhibit greater logical complexity compared to typical code, yet their analysis is unavoidable in areas such as virus analysis and legacy code inspection. Existing methods often rely on data or structural pattern matching, leading to suboptimal generalizability and suffering from manual work. In this paper, we propose a novel framework called FoC to Figure out the Cryptographic functions in stripped binaries. In FoC, we first build a binary large language model (FoC-BinLLM) to summarize the semantics of cryptographic functions in natural language. The prediction of FoC-BinLLM is insensitive to minor changes, such as vulnerability patches. To mitigate it, we further build a binary code similarity model (FoC-Sim) upon the FoC-BinLLM to create change-sensitive representations and use it to retrieve similar implementations of unknown cryptographic functions in a database. In addition, we construct a cryptographic binary dataset for evaluation and to facilitate further research in this domain. And an automated method is devised to create semantic labels for extensive binary functions. Evaluation results demonstrate that FoC-BinLLM outperforms ChatGPT by 14.61% on the ROUGE-L score. FoC-Sim outperforms the previous best methods with a 52% higher Recall@1. Furthermore, our method also shows practical ability in virus analysis and 1-day vulnerability detection.
