Large Language Model (LLM) for Software Security: Code Analysis, Malware Analysis, Reverse Engineering
Hamed Jelodar, Samita Bai, Parisa Hamedi, Hesamodin Mohammadian, Roozbeh Razavi-Far, Ali Ghorbani
TL;DR
This review surveys the role of large language models in malware code analysis, spanning static and dynamic malware analysis, reverse engineering, and defense-oriented code inspection. It synthesizes methods, datasets, prompting strategies, and fine-tuning approaches, distinguishing general-purpose LLMs from specialized offensive models. Key contributions include mapping environments (Android, Java, websites, Windows), outlining dataset augmentation and few-/zero-shot techniques, and highlighting both opportunities and limitations such as data quality and safety. The work also identifies future directions—knowledge graphs, KE-PLMs, and hybrid RE pipelines—that could strengthen proactive cybersecurity and resilience against evolving threats.
Abstract
Large Language Models (LLMs) have recently emerged as powerful tools in cybersecurity, offering advanced capabilities in malware detection, generation, and real-time monitoring. Numerous studies have explored their application in cybersecurity, demonstrating their effectiveness in identifying novel malware variants, analyzing malicious code structures, and enhancing automated threat analysis. Several transformer-based architectures and LLM-driven models have been proposed to improve malware analysis, leveraging semantic and structural insights to recognize malicious intent more accurately. This study presents a comprehensive review of LLM-based approaches in malware code analysis, summarizing recent advancements, trends, and methodologies. We examine notable scholarly works to map the research landscape, identify key challenges, and highlight emerging innovations in LLM-driven cybersecurity. Additionally, we emphasize the role of static analysis in malware detection, introduce notable datasets and specialized LLM models, and discuss essential datasets supporting automated malware research. This study serves as a valuable resource for researchers and cybersecurity professionals, offering insights into LLM-powered malware detection and defence strategies while outlining future directions for strengthening cybersecurity resilience.
