Call graph discovery in binary programs from unknown instruction set architectures
Håvard Pettersen, Donn Morrison
TL;DR
This work tackles reverse engineering of binaries with unknown instruction set architectures by introducing a heuristic pipeline to detect candidate call and return opcodes and to construct call graphs. It defines the Opcode Candidacy Probability Score (OCP-Score) to rank opcodes based on static cues such as absolute/relative addressing and nearby epilogue patterns, enabling automated yet ranked extraction of call graphs. The method is evaluated on a small multi-ISA dataset (including Chip8, MIPS, AArch64, OpenVPN, cURL, Chipquarium), showing promising opcode detection and plausible graphs under fixed-length ISAs, while highlighting limitations for variable-length architectures and noisy data. The work provides a practical, low-dependency tool to assist reverse engineers and security analysts, with future directions including broader ISA support, NOP/disambiguation, and larger-scale validation.
Abstract
This study addresses the challenge of reverse engineering binaries from unknown instruction set architectures, a complex task with potential implications for software maintenance and cyber-security. We focus on the tasks of detecting candidate call and return opcodes for automatic extraction of call graphs in order to simplify the reverse engineering process. Empirical testing on a small dataset of binary files from different architectures demonstrates that the approach can accurately detect specific opcodes under conditions of noisy data. The method lays the groundwork for a valuable tool for reverse engineering where the reverse engineer has minimal a priori knowledge of the underlying instruction set architecture.
