Table of Contents
Fetching ...

Adapting Novelty towards Generating Antigens for Antivirus systems

Ritwik Murali, C Shunmuga Velayutham

TL;DR

This work tackles the brittleness of signature-based antivirus detection by introducing a modular assembly-language framework, MAGE, that uses novelty-search-guided evolutionary algorithms to generate diverse, valid malware variants while preserving malicious intent. By representing malware as linear assembly programs and applying carefully constrained mutation and crossover operators, the framework promotes high-diversity variant generation and constructs a dataset of antigens that can train malware analysis engines. The key contributions include a novelty-based quality indicator, a set of assembly-level transformation operators (e.g., $T_{FI}$, $T_{FJ}$, $T_{UB}$, $T_{CZJ}$, $T_{CNZJ}$, $T_{CBI}$), and empirical evidence that evolved variants evade over 98% of scanners on VirusTotal, illustrating the practical value for proactive defense. The resulting antigen dataset and modular architecture offer a flexible platform for advancing malware detection and resilience against evolving threats.

Abstract

It is well known that anti-malware scanners depend on malware signatures to identify malware. However, even minor modifications to malware code structure results in a change in the malware signature thus enabling the variant to evade detection by scanners. Therefore, there exists the need for a proactively generated malware variant dataset to aid detection of such diverse variants by automated antivirus scanners. This paper proposes and demonstrates a generic assembly source code based framework that facilitates any evolutionary algorithm to generate diverse and potential variants of an input malware, while retaining its maliciousness, yet capable of evading antivirus scanners. Generic code transformation functions and a novelty search supported quality metric have been proposed as components of the framework to be used respectively as variation operators and fitness function, for evolutionary algorithms. The results demonstrate the effectiveness of the framework in generating diverse variants and the generated variants have been shown to evade over 98% of popular antivirus scanners. The malware variants evolved by the framework can serve as antigens to assist malware analysis engines to improve their malware detection algorithms.

Adapting Novelty towards Generating Antigens for Antivirus systems

TL;DR

This work tackles the brittleness of signature-based antivirus detection by introducing a modular assembly-language framework, MAGE, that uses novelty-search-guided evolutionary algorithms to generate diverse, valid malware variants while preserving malicious intent. By representing malware as linear assembly programs and applying carefully constrained mutation and crossover operators, the framework promotes high-diversity variant generation and constructs a dataset of antigens that can train malware analysis engines. The key contributions include a novelty-based quality indicator, a set of assembly-level transformation operators (e.g., , , , , , ), and empirical evidence that evolved variants evade over 98% of scanners on VirusTotal, illustrating the practical value for proactive defense. The resulting antigen dataset and modular architecture offer a flexible platform for advancing malware detection and resilience against evolving threats.

Abstract

It is well known that anti-malware scanners depend on malware signatures to identify malware. However, even minor modifications to malware code structure results in a change in the malware signature thus enabling the variant to evade detection by scanners. Therefore, there exists the need for a proactively generated malware variant dataset to aid detection of such diverse variants by automated antivirus scanners. This paper proposes and demonstrates a generic assembly source code based framework that facilitates any evolutionary algorithm to generate diverse and potential variants of an input malware, while retaining its maliciousness, yet capable of evading antivirus scanners. Generic code transformation functions and a novelty search supported quality metric have been proposed as components of the framework to be used respectively as variation operators and fitness function, for evolutionary algorithms. The results demonstrate the effectiveness of the framework in generating diverse variants and the generated variants have been shown to evade over 98% of popular antivirus scanners. The malware variants evolved by the framework can serve as antigens to assist malware analysis engines to improve their malware detection algorithms.

Paper Structure

This paper contains 10 sections, 2 equations, 6 figures, 2 tables, 2 algorithms.

Figures (6)

  • Figure 1: Linear representation of virus code.
  • Figure 2: Graph representation of virus code.
  • Figure 3: Forced JMP Transformation
  • Figure 4: Code block interchange transformation
  • Figure 5: Similarity values of Initial and Final population using $\alpha$ and $\beta$
  • ...and 1 more figures