Table of Contents
Fetching ...

Synergistic Directed Execution and LLM-Driven Analysis for Zero-Day AI-Generated Malware Detection

George Edwards, Mahdi Eslamimehr

TL;DR

A novel hybrid analysis framework that synergistically combinesconcolic execution with LLM-augmented path prioritization and deep-learning-based vulnerability classification to detect zero-day AI-generated malware with provable guarantees is introduced.

Abstract

The weaponization of LLMs for automated malware generation poses an existential threat to conventional detection paradigms. AI-generated malware exhibits polymorphic, metamorphic, and context-aware evasion capabilities that render signature-based and shallow heuristic defenses obsolete. This paper introduces a novel hybrid analysis framework that synergistically combines \emph{concolic execution} with \emph{LLM-augmented path prioritization} and \emph{deep-learning-based vulnerability classification} to detect zero-day AI-generated malware with provable guarantees. We formalize the detection problem within a first-order temporal logic over program execution traces, define a lattice-theoretic abstraction for path constraint spaces, and prove both the \emph{soundness} and \emph{relative completeness} of our detection algorithm, assuming classifier correctness. The framework introduces three novel algorithms: (i) an LLM-guided concolic exploration strategy that reduces the average number of explored paths by 73.2\% compared to depth-first search while maintaining equivalent malicious-path coverage; (ii) a transformer-based path-constraint classifier trained on symbolic execution traces; and (iii) a feedback loop that iteratively refines the LLM's prioritization policy using reinforcement learning from detection outcomes. We provide a comprehensive implementation built upon \texttt{angr} 9.2, \texttt{Z3} 4.12, Hugging Face Transformers 4.38, and PyTorch 2.2, with configuration details enabling reproducibility. Experimental evaluation on the EMBER, Malimg, SOREL-20M, and a novel AI-Gen-Malware benchmark comprising 2{,}500 LLM-synthesized samples demonstrates that achieves 98.7\% accuracy on conventional malware and 97.5\% accuracy on AI-generated threats, outperforming ClamAV, YARA, MalConv, and EMBER-GBDT baselines by margins of 8.4--52.2 percentage points on AI-generated samples.

Synergistic Directed Execution and LLM-Driven Analysis for Zero-Day AI-Generated Malware Detection

TL;DR

A novel hybrid analysis framework that synergistically combinesconcolic execution with LLM-augmented path prioritization and deep-learning-based vulnerability classification to detect zero-day AI-generated malware with provable guarantees is introduced.

Abstract

The weaponization of LLMs for automated malware generation poses an existential threat to conventional detection paradigms. AI-generated malware exhibits polymorphic, metamorphic, and context-aware evasion capabilities that render signature-based and shallow heuristic defenses obsolete. This paper introduces a novel hybrid analysis framework that synergistically combines \emph{concolic execution} with \emph{LLM-augmented path prioritization} and \emph{deep-learning-based vulnerability classification} to detect zero-day AI-generated malware with provable guarantees. We formalize the detection problem within a first-order temporal logic over program execution traces, define a lattice-theoretic abstraction for path constraint spaces, and prove both the \emph{soundness} and \emph{relative completeness} of our detection algorithm, assuming classifier correctness. The framework introduces three novel algorithms: (i) an LLM-guided concolic exploration strategy that reduces the average number of explored paths by 73.2\% compared to depth-first search while maintaining equivalent malicious-path coverage; (ii) a transformer-based path-constraint classifier trained on symbolic execution traces; and (iii) a feedback loop that iteratively refines the LLM's prioritization policy using reinforcement learning from detection outcomes. We provide a comprehensive implementation built upon \texttt{angr} 9.2, \texttt{Z3} 4.12, Hugging Face Transformers 4.38, and PyTorch 2.2, with configuration details enabling reproducibility. Experimental evaluation on the EMBER, Malimg, SOREL-20M, and a novel AI-Gen-Malware benchmark comprising 2{,}500 LLM-synthesized samples demonstrates that achieves 98.7\% accuracy on conventional malware and 97.5\% accuracy on AI-generated threats, outperforming ClamAV, YARA, MalConv, and EMBER-GBDT baselines by margins of 8.4--52.2 percentage points on AI-generated samples.
Paper Structure (32 sections, 4 theorems, 6 equations, 8 figures, 10 tables, 5 algorithms)

This paper contains 32 sections, 4 theorems, 6 equations, 8 figures, 10 tables, 5 algorithms.

Key Result

Theorem 1

Let $P$ be a program and $\Phi_{\text{mal}}$ be a malicious behavior specification. If CogniCrypt reports $P$ as malicious, then there exists an execution trace $\tau \in \mathcal{T}(P)$ and a formula $\varphi_i \in \Phi_{\text{mal}}$ such that $\tau \models \varphi_i$.

Figures (8)

  • Figure 1: CogniCrypt System Architecture. The concolic execution engine explores paths guided by the LLM prioritizer. Path constraints are classified by the vulnerability classifier, and detection outcomes feed back via RL to refine the LLM's prioritization policy.
  • Figure 2: Detection accuracy comparison across benchmark datasets. CogniCrypt maintains consistently high accuracy, while signature-based tools (ClamAV, YARA) degrade severely on AI-generated malware.
  • Figure 3: F1-Score comparison across different LLM backends used in CogniCrypt. All LLMs achieve strong performance, with GPT-4 leading marginally.
  • Figure 4: Path exploration efficiency. LLM-guided exploration achieves 95% malicious code coverage with 73.2% fewer paths than DFS and 68.5% fewer than BFS.
  • Figure 5: ROC curves on the AI-Gen-Malware dataset. CogniCrypt achieves an AUC of 0.993, significantly outperforming all baselines.
  • ...and 3 more figures

Theorems & Definitions (17)

  • Definition 1: Program Model
  • Definition 2: Execution Trace
  • Definition 3: Symbolic State
  • Definition 4: Path Constraint Space
  • Definition 5: Syntax of $\mathcal{L}_{\text{CogniCrypt}}$
  • Definition 6: Malicious Behavior Specification
  • Definition 7: Concolic Execution
  • Definition 8: Path Priority Function
  • Definition 9: Priority Queue Ordering
  • Theorem 1: Soundness
  • ...and 7 more