Synergistic Directed Execution and LLM-Driven Analysis for Zero-Day AI-Generated Malware Detection

George Edwards; Mahdi Eslamimehr

Synergistic Directed Execution and LLM-Driven Analysis for Zero-Day AI-Generated Malware Detection

George Edwards, Mahdi Eslamimehr

TL;DR

A novel hybrid analysis framework that synergistically combinesconcolic execution with LLM-augmented path prioritization and deep-learning-based vulnerability classification to detect zero-day AI-generated malware with provable guarantees is introduced.

Abstract

The weaponization of LLMs for automated malware generation poses an existential threat to conventional detection paradigms. AI-generated malware exhibits polymorphic, metamorphic, and context-aware evasion capabilities that render signature-based and shallow heuristic defenses obsolete. This paper introduces a novel hybrid analysis framework that synergistically combines \emph{concolic execution} with \emph{LLM-augmented path prioritization} and \emph{deep-learning-based vulnerability classification} to detect zero-day AI-generated malware with provable guarantees. We formalize the detection problem within a first-order temporal logic over program execution traces, define a lattice-theoretic abstraction for path constraint spaces, and prove both the \emph{soundness} and \emph{relative completeness} of our detection algorithm, assuming classifier correctness. The framework introduces three novel algorithms: (i) an LLM-guided concolic exploration strategy that reduces the average number of explored paths by 73.2\% compared to depth-first search while maintaining equivalent malicious-path coverage; (ii) a transformer-based path-constraint classifier trained on symbolic execution traces; and (iii) a feedback loop that iteratively refines the LLM's prioritization policy using reinforcement learning from detection outcomes. We provide a comprehensive implementation built upon \texttt{angr} 9.2, \texttt{Z3} 4.12, Hugging Face Transformers 4.38, and PyTorch 2.2, with configuration details enabling reproducibility. Experimental evaluation on the EMBER, Malimg, SOREL-20M, and a novel AI-Gen-Malware benchmark comprising 2{,}500 LLM-synthesized samples demonstrates that achieves 98.7\% accuracy on conventional malware and 97.5\% accuracy on AI-generated threats, outperforming ClamAV, YARA, MalConv, and EMBER-GBDT baselines by margins of 8.4--52.2 percentage points on AI-generated samples.

Synergistic Directed Execution and LLM-Driven Analysis for Zero-Day AI-Generated Malware Detection

TL;DR

Abstract

Paper Structure (32 sections, 4 theorems, 6 equations, 8 figures, 10 tables, 5 algorithms)

This paper contains 32 sections, 4 theorems, 6 equations, 8 figures, 10 tables, 5 algorithms.

Introduction
Related Work
Concolic and Symbolic Execution for Security Analysis
Machine Learning for Malware Detection
AI-Generated Malware and Adversarial Threats
Hybrid Approaches
Theoretical Foundations
Program Model and Execution Semantics
Temporal Logic for Malicious Behavior Specification
Concolic Execution Formalization
LLM-Guided Path Prioritization
Soundness and Completeness
Threat Model
Algorithms
Implementation
...and 17 more sections

Key Result

Theorem 1

Let $P$ be a program and $\Phi_{\text{mal}}$ be a malicious behavior specification. If CogniCrypt reports $P$ as malicious, then there exists an execution trace $\tau \in \mathcal{T}(P)$ and a formula $\varphi_i \in \Phi_{\text{mal}}$ such that $\tau \models \varphi_i$.

Figures (8)

Figure 1: CogniCrypt System Architecture. The concolic execution engine explores paths guided by the LLM prioritizer. Path constraints are classified by the vulnerability classifier, and detection outcomes feed back via RL to refine the LLM's prioritization policy.
Figure 2: Detection accuracy comparison across benchmark datasets. CogniCrypt maintains consistently high accuracy, while signature-based tools (ClamAV, YARA) degrade severely on AI-generated malware.
Figure 3: F1-Score comparison across different LLM backends used in CogniCrypt. All LLMs achieve strong performance, with GPT-4 leading marginally.
Figure 4: Path exploration efficiency. LLM-guided exploration achieves 95% malicious code coverage with 73.2% fewer paths than DFS and 68.5% fewer than BFS.
Figure 5: ROC curves on the AI-Gen-Malware dataset. CogniCrypt achieves an AUC of 0.993, significantly outperforming all baselines.
...and 3 more figures

Theorems & Definitions (17)

Definition 1: Program Model
Definition 2: Execution Trace
Definition 3: Symbolic State
Definition 4: Path Constraint Space
Definition 5: Syntax of $\mathcal{L}_{\text{CogniCrypt}}$
Definition 6: Malicious Behavior Specification
Definition 7: Concolic Execution
Definition 8: Path Priority Function
Definition 9: Priority Queue Ordering
Theorem 1: Soundness
...and 7 more

Synergistic Directed Execution and LLM-Driven Analysis for Zero-Day AI-Generated Malware Detection

TL;DR

Abstract

Synergistic Directed Execution and LLM-Driven Analysis for Zero-Day AI-Generated Malware Detection

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (17)