Explainable Malware Detection through Integrated Graph Reduction and Learning Techniques
Hesamodin Mohammadian, Griffin Higgins, Samuel Ansong, Roozbeh Razavi-Far, Ali A. Ghorbani
TL;DR
This work tackles malware detection using CFGs and FCGs by integrating graph reduction and explainability into a GNN-based pipeline. It introduces Leaf Prune, Comp Prune, K-core, and Walk Index Sparsification to shrink large program graphs while preserving discriminative information, and employs two node embeddings (Function Name Embedding and Assembly Embedding) to feed a GCN classifier. The framework is augmented with GNNExplainer to provide interpretable subgraph explanations, demonstrating that leaf pruning often yields the best efficiency-accuracy trade-off and that AE generally outperforms FNE. The approach shows promise for scalable, transparent malware detection on real-world datasets like BODMAS, Dike, and PMML, enabling faster analysis with meaningful explanations for security analysts.
Abstract
Control Flow Graphs and Function Call Graphs have become pivotal in providing a detailed understanding of program execution and effectively characterizing the behavior of malware. These graph-based representations, when combined with Graph Neural Networks (GNN), have shown promise in developing high-performance malware detectors. However, challenges remain due to the large size of these graphs and the inherent opacity in the decision-making process of GNNs. This paper addresses these issues by developing several graph reduction techniques to reduce graph size and applying the state-of-the-art GNNExplainer to enhance the interpretability of GNN outputs. The analysis demonstrates that integrating our proposed graph reduction technique along with GNNExplainer in the malware detection framework significantly reduces graph size while preserving high performance, providing an effective balance between efficiency and transparency in malware detection.
