Resolving Indirect Calls in Binary Code via Cross-Reference Augmented Graph Neural Networks

Haotian Zhang; Kun Liu; Cristian Garces; Chenke Luo; Yu Lei; Jiang Ming

Resolving Indirect Calls in Binary Code via Cross-Reference Augmented Graph Neural Networks

Haotian Zhang, Kun Liu, Cristian Garces, Chenke Luo, Yu Lei, Jiang Ming

TL;DR

This work tackles the challenge of resolving indirect call targets in stripped binaries, a problem that hinders sound inter-procedural analysis. It introduces CupidCall, which augments control flow graphs with cross-reference information and relies on compiler-level type analysis to produce high-quality training data, feeding a heterogeneous GNN that predicts icall targets. CupidCall achieves a strong F1 score of 95.2% on real-world binaries, outperforming the previous state-of-the-art (Callee) at 89.9%, and demonstrates robustness across optimization levels while improving binary-level CFI granularity. The approach yields more precise inter-procedural analyses and has practical implications for binary security tasks, with an open-source prototype and datasets released to support reproducibility.

Abstract

Binary code analysis is essential in scenarios where source code is unavailable, with extensive applications across various security domains. However, accurately resolving indirect call targets remains a longstanding challenge in maintaining the integrity of static analysis in binary code. This difficulty arises because the operand of a call instruction (e.g., call rax) remains unknown until runtime, resulting in an incomplete inter-procedural control flow graph (CFG). Previous approaches have struggled with low accuracy and limited scalability. To address these limitations, recent work has increasingly turned to machine learning (ML) to enhance analysis. However, this ML-driven approach faces two significant obstacles: low-quality callsite-callee training pairs and inadequate binary code representation, both of which undermine the accuracy of ML models. In this paper, we introduce CupidCall, a novel approach for resolving indirect calls using graph neural networks. Existing ML models in this area often overlook key elements such as data and code cross-references, which are essential for understanding a program's control flow. In contrast, CupidCall augments CFGs with cross-references, preserving rich semantic information. Additionally, we leverage advanced compiler-level type analysis to generate high-quality callsite-callee training pairs, enhancing model precision and reliability. We further design a graph neural model that leverages augmented CFGs and relational graph convolutions for accurate target prediction. Evaluated against real-world binaries from GitHub and the Arch User Repository on x86_64 architecture, CupidCall achieves an F1 score of 95.2%, outperforming state-of-the-art ML-based approaches. These results highlight CupidCall's effectiveness in building precise inter-procedural CFGs and its potential to advance downstream binary analysis and security applications.

Resolving Indirect Calls in Binary Code via Cross-Reference Augmented Graph Neural Networks

TL;DR

Abstract

Resolving Indirect Calls in Binary Code via Cross-Reference Augmented Graph Neural Networks

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)