Table of Contents
Fetching ...

Metamorphic Malware Evolution: The Potential and Peril of Large Language Models

Pooria Madani

TL;DR

A framework for creating self-testing program mutation engines based on LLM/Transformer-based models is introduced that serves as an essential tool in testing next-gen metamorphic malware detection engines.

Abstract

Code metamorphism refers to a computer programming exercise wherein the program modifies its own code (partial or entire) consistently and automatically while retaining its core functionality. This technique is often used for online performance optimization and automated crash recovery in certain mission-critical applications. However, the technique has been misappropriated by malware creators to bypass signature-based detection measures instituted by anti-malware engines. However, current code mutation engines used by threat actors offer only a limited degree of mutation, which is frequently detectable via static code analysis. The advent of large language models (LLMs), such as ChatGPT 4.0 and Google Bard may lead to a significant evolution in this landscape. These models have demonstrated a level of algorithm comprehension and code synthesis capability that closely resembles human abilities. This advancement has sparked concerns among experts that such models could be exploited by threat actors to generate sophisticated metamorphic malware. This paper explores the potential of several prominent LLMs for software code mutation that may be used to reconstruct (with mutation) existing malware code bases or create new forms of embedded mutation engines for next-gen metamorphic malwares. In this work, we introduce a framework for creating self-testing program mutation engines based on LLM/Transformer-based models. The proposed framework serves as an essential tool in testing next-gen metamorphic malware detection engines.

Metamorphic Malware Evolution: The Potential and Peril of Large Language Models

TL;DR

A framework for creating self-testing program mutation engines based on LLM/Transformer-based models is introduced that serves as an essential tool in testing next-gen metamorphic malware detection engines.

Abstract

Code metamorphism refers to a computer programming exercise wherein the program modifies its own code (partial or entire) consistently and automatically while retaining its core functionality. This technique is often used for online performance optimization and automated crash recovery in certain mission-critical applications. However, the technique has been misappropriated by malware creators to bypass signature-based detection measures instituted by anti-malware engines. However, current code mutation engines used by threat actors offer only a limited degree of mutation, which is frequently detectable via static code analysis. The advent of large language models (LLMs), such as ChatGPT 4.0 and Google Bard may lead to a significant evolution in this landscape. These models have demonstrated a level of algorithm comprehension and code synthesis capability that closely resembles human abilities. This advancement has sparked concerns among experts that such models could be exploited by threat actors to generate sophisticated metamorphic malware. This paper explores the potential of several prominent LLMs for software code mutation that may be used to reconstruct (with mutation) existing malware code bases or create new forms of embedded mutation engines for next-gen metamorphic malwares. In this work, we introduce a framework for creating self-testing program mutation engines based on LLM/Transformer-based models. The proposed framework serves as an essential tool in testing next-gen metamorphic malware detection engines.

Paper Structure

This paper contains 13 sections, 3 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Three semantically identical Python expressions for increasing the value of variable ‘a’ by 1.
  • Figure 2: Two semantically identical Python code snippets demonstrating expression/instruction permutation.
  • Figure 3: Python code snippet demonstrating dead code insertion – variable ‘b’ is never being used.
  • Figure 4: Region where LLM can be measured based on possible $pass@k$ and $variation@k$ values.
  • Figure 5: The proposed framework for using LLMs with ability of code/program synthesis to be used for source code mutation guided by unit test procedures.
  • ...and 2 more figures