SoK: Leveraging Transformers for Malware Analysis
Pradip Kunwar, Kshitiz Aryal, Maanak Gupta, Mahmoud Abdelsalam, Elisa Bertino
TL;DR
This SoK surveys how transformers are adapted to malware analysis, organizing existing work into taxonomies by transformer variants and by feature representations. It demonstrates that encoder-focused, pre-trained, and custom-enhanced transformers can achieve high accuracy across static, dynamic, and multi-modal malware tasks, while also addressing obfuscation, efficiency, and deployment challenges. By cataloging datasets and detailing practical perspectives, it provides a foundation for designing robust, scalable transformer-based malware analysis systems and guides future research in multi-modal fusion, few-shot learning, and adversarial robustness. Overall, the work highlights the potential and limitations of transformer models in real-world cybersecurity contexts and outlines concrete directions for advancing this emerging field.
Abstract
The introduction of transformers has been an important breakthrough for AI research and application as transformers are the foundation behind Generative AI. A promising application domain for transformers is cybersecurity, in particular the malware domain analysis. The reason is the flexibility of the transformer models in handling long sequential features and understanding contextual relationships. However, as the use of transformers for malware analysis is still in the infancy stage, it is critical to evaluate, systematize, and contextualize existing literature to foster future research. This Systematization of Knowledge (SoK) paper aims to provide a comprehensive analysis of transformer-based approaches designed for malware analysis. Based on our systematic analysis of existing knowledge, we structure and propose taxonomies based on: (a) how different transformers are adapted, organized, and modified across various use cases; and (b) how diverse feature types and their representation capabilities are reflected. We also provide an inventory of datasets used to explore multiple research avenues in the use of transformers for malware analysis and discuss open challenges with future research directions. We believe that this SoK paper will assist the research community in gaining detailed insights from existing work and will serve as a foundational resource for implementing novel research using transformers for malware analysis.
