Table of Contents
Fetching ...

The Power of Words: Generating PowerShell Attacks from Natural Language

Pietro Liguori, Christian Marescalco, Roberto Natella, Vittorio Orbinato, Luciano Pianese

TL;DR

The paper tackles the challenge of generating offensive PowerShell code from natural language descriptions by leveraging neural machine translation with a two-stage training pipeline: domain-adaptive pre-training on a large unlabeled PowerShell corpus and supervised fine-tuning on a labeled NL-to-PowerShell dataset aligned to MITRE ATT&CK tactics. It systematically evaluates three state-of-the-art NMT models (CodeT5+, CodeGPT, CodeGen) across zero-shot, varied fine-tuning regimes, and static/dynamic analyses, and compares them to ChatGPT 3.5. Results show that fine-tuning substantially improves syntactic and semantic matching, with CodeGen excelling at shorter fine-tuning horizons and CodeT5+ performing best at longer fine-tuning with pretraining in some configurations; all fine-tuned models outperform ChatGPT on several metrics, and execution analysis indicates realistic adversarial behavior. The work provides two novel PowerShell datasets, demonstrates the feasibility of automated offensive code generation in this domain, and highlights both the potential utility for adversary emulation and the ethical considerations needed to mitigate misuse.

Abstract

As the Windows OS stands out as one of the most targeted systems, the PowerShell language has become a key tool for malicious actors and cybersecurity professionals (e.g., for penetration testing). This work explores an uncharted domain in AI code generation by automatically generating offensive PowerShell code from natural language descriptions using Neural Machine Translation (NMT). For training and evaluation purposes, we propose two novel datasets with PowerShell code samples, one with manually curated descriptions in natural language and another code-only dataset for reinforcing the training. We present an extensive evaluation of state-of-the-art NMT models and analyze the generated code both statically and dynamically. Results indicate that tuning NMT using our dataset is effective at generating offensive PowerShell code. Comparative analysis against the most widely used LLM service ChatGPT reveals the specialized strengths of our fine-tuned models.

The Power of Words: Generating PowerShell Attacks from Natural Language

TL;DR

The paper tackles the challenge of generating offensive PowerShell code from natural language descriptions by leveraging neural machine translation with a two-stage training pipeline: domain-adaptive pre-training on a large unlabeled PowerShell corpus and supervised fine-tuning on a labeled NL-to-PowerShell dataset aligned to MITRE ATT&CK tactics. It systematically evaluates three state-of-the-art NMT models (CodeT5+, CodeGPT, CodeGen) across zero-shot, varied fine-tuning regimes, and static/dynamic analyses, and compares them to ChatGPT 3.5. Results show that fine-tuning substantially improves syntactic and semantic matching, with CodeGen excelling at shorter fine-tuning horizons and CodeT5+ performing best at longer fine-tuning with pretraining in some configurations; all fine-tuned models outperform ChatGPT on several metrics, and execution analysis indicates realistic adversarial behavior. The work provides two novel PowerShell datasets, demonstrates the feasibility of automated offensive code generation in this domain, and highlights both the potential utility for adversary emulation and the ethical considerations needed to mitigate misuse.

Abstract

As the Windows OS stands out as one of the most targeted systems, the PowerShell language has become a key tool for malicious actors and cybersecurity professionals (e.g., for penetration testing). This work explores an uncharted domain in AI code generation by automatically generating offensive PowerShell code from natural language descriptions using Neural Machine Translation (NMT). For training and evaluation purposes, we propose two novel datasets with PowerShell code samples, one with manually curated descriptions in natural language and another code-only dataset for reinforcing the training. We present an extensive evaluation of state-of-the-art NMT models and analyze the generated code both statically and dynamically. Results indicate that tuning NMT using our dataset is effective at generating offensive PowerShell code. Comparative analysis against the most widely used LLM service ChatGPT reveals the specialized strengths of our fine-tuned models.
Paper Structure (16 sections, 3 equations, 8 figures, 8 tables)

This paper contains 16 sections, 3 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: Overview of our research study.
  • Figure 2: Mapping of fine-tuning dataset samples on the MITRE ATT&CK tactics.
  • Figure 3: Static analysis workflow.
  • Figure 4: Counts for different warning types in each test set.
  • Figure 5: Execution analysis workflow.
  • ...and 3 more figures