A Research and Development Portfolio of GNN Centric Malware Detection, Explainability, and Dataset Curation
Hossein Shokouhinejad, Griffin Higgins, Roozbeh Razavi-Far, Ali A. Ghorbani
TL;DR
The paper addresses the core challenges of applying Graph Neural Networks to malware detection by presenting a cohesive portfolio of six interconnected studies that advance efficiency, interpretability, and reproducibility. It progresses from a foundational survey to graph reduction techniques (including Node-Centric Pruning with walks of fixed length $L$) and integrated pruning-learning frameworks, then to explainability and consistency via stability-promoting methods and dual prototype-based explanations. An ensemble framework with attention-guided stacking combines diverse GNNs and provides ensemble-aware explanations, while parallel dataset curation releases (CFGs/FCGs from PE files) enable reproducible research. Together, these contributions establish a comprehensive workflow that improves scalability, transparency, and practical deployment of GNN-based malware detection, and they provide valuable benchmarks for future work.
Abstract
Graph Neural Networks (GNNs) have become an effective tool for malware detection by capturing program execution through graph-structured representations. However, important challenges remain regarding scalability, interpretability, and the availability of reliable datasets. This paper brings together six related studies that collectively address these issues. The portfolio begins with a survey of graph-based malware detection and explainability, then advances to new graph reduction methods, integrated reduction-learning approaches, and investigations into the consistency of explanations. It also introduces dual explanation techniques based on subgraph matching and develops ensemble-based models with attention-guided stacked GNNs to improve interpretability. In parallel, curated datasets of control flow graphs are released to support reproducibility and enable future research. Together, these contributions form a coherent line of research that strengthens GNN-based malware detection by enhancing efficiency, increasing transparency, and providing solid experimental foundations.
