LLMalMorph: On The Feasibility of Generating Variant Malware using Large-Language-Models
Md Ajwad Akil, Adrian Shuai Li, Imtiaz Karim, Arun Iyengar, Ashish Kundu, Vinny Parla, Elisa Bertino
TL;DR
The paper investigates whether off-the-shelf Large Language Models can be steered, without fine-tuning, to generate functional malware variants from source code. It introduces LLMalMorph, a two-module framework that uses function-level AST extraction and six prompt-guided code transformations to produce diverse, compilable variants while preserving core semantics; a human-in-the-loop aids debugging. The authors generate 618 variants from 10 Windows malware samples and show meaningful antivirus evasion (up to 15% reduction on VirusTotal and 8–13% on Hybrid Analysis) and high attack success rates (up to 91% against ML detectors), with semantic preservation in a majority of evasive variants. The work highlights practical lessons on prompt design, transformation strategy, and the trade-offs between manual effort and evasion efficacy, and it discusses responsible disclosure and ethics given the dual-use nature of such technology. Overall, LLMalMorph demonstrates the feasibility and risks of source-code–level, LLM-guided malware variant generation and motivates defensive research and governance around AI-assisted cyber threats.
Abstract
Large Language Models (LLMs) have transformed software development and automated code generation. Motivated by these advancements, this paper explores the feasibility of LLMs in modifying malware source code to generate variants. We introduce LLMalMorph, a semi-automated framework that leverages semantical and syntactical code comprehension by LLMs to generate new malware variants. LLMalMorph extracts function-level information from the malware source code and employs custom-engineered prompts coupled with strategically defined code transformations to guide the LLM in generating variants without resource-intensive fine-tuning. To evaluate LLMalMorph, we collected 10 diverse Windows malware samples of varying types, complexity and functionality and generated 618 variants. Our experiments demonstrate that LLMalMorph variants can effectively evade antivirus engines, achieving typical detection rate reductions of 10-15% across multiple complex samples. Furthermore, without explicitly targeting learning-based detectors, LLMalMorph attained attack success rates of up to 91% against a Machine Learning (ML) based malware detector. We also discuss the limitations of current LLM capabilities in generating malware variants from source code and assess where this emerging technology stands in the broader context of malware variant generation.
