Adapting Novelty towards Generating Antigens for Antivirus systems
Ritwik Murali, C Shunmuga Velayutham
TL;DR
This work tackles the brittleness of signature-based antivirus detection by introducing a modular assembly-language framework, MAGE, that uses novelty-search-guided evolutionary algorithms to generate diverse, valid malware variants while preserving malicious intent. By representing malware as linear assembly programs and applying carefully constrained mutation and crossover operators, the framework promotes high-diversity variant generation and constructs a dataset of antigens that can train malware analysis engines. The key contributions include a novelty-based quality indicator, a set of assembly-level transformation operators (e.g., $T_{FI}$, $T_{FJ}$, $T_{UB}$, $T_{CZJ}$, $T_{CNZJ}$, $T_{CBI}$), and empirical evidence that evolved variants evade over 98% of scanners on VirusTotal, illustrating the practical value for proactive defense. The resulting antigen dataset and modular architecture offer a flexible platform for advancing malware detection and resilience against evolving threats.
Abstract
It is well known that anti-malware scanners depend on malware signatures to identify malware. However, even minor modifications to malware code structure results in a change in the malware signature thus enabling the variant to evade detection by scanners. Therefore, there exists the need for a proactively generated malware variant dataset to aid detection of such diverse variants by automated antivirus scanners. This paper proposes and demonstrates a generic assembly source code based framework that facilitates any evolutionary algorithm to generate diverse and potential variants of an input malware, while retaining its maliciousness, yet capable of evading antivirus scanners. Generic code transformation functions and a novelty search supported quality metric have been proposed as components of the framework to be used respectively as variation operators and fitness function, for evolutionary algorithms. The results demonstrate the effectiveness of the framework in generating diverse variants and the generated variants have been shown to evade over 98% of popular antivirus scanners. The malware variants evolved by the framework can serve as antigens to assist malware analysis engines to improve their malware detection algorithms.
