Table of Contents
Fetching ...

Can LLMs Hack Enterprise Networks? -- Replicated Computational Results (RCR) Report

Andreas Happe, Jürgen Cito

TL;DR

This RCR report describes the artifacts used in the paper, how to create an evaluation setup, and highlights the analysis scripts provided within the prototype.

Abstract

This is the Replicated Computational Results (RCR) Report for the paper ``Can LLMs Hack Enterprise Networks?" The paper empirically investigates the efficacy and effectiveness of different LLMs for penetration-testing enterprise networks, i.e., Microsoft Active Directory Assumed-Breach Simulations. This RCR report describes the artifacts used in the paper, how to create an evaluation setup, and highlights the analysis scripts provided within our prototype.

Can LLMs Hack Enterprise Networks? -- Replicated Computational Results (RCR) Report

TL;DR

This RCR report describes the artifacts used in the paper, how to create an evaluation setup, and highlights the analysis scripts provided within the prototype.

Abstract

This is the Replicated Computational Results (RCR) Report for the paper ``Can LLMs Hack Enterprise Networks?" The paper empirically investigates the efficacy and effectiveness of different LLMs for penetration-testing enterprise networks, i.e., Microsoft Active Directory Assumed-Breach Simulations. This RCR report describes the artifacts used in the paper, how to create an evaluation setup, and highlights the analysis scripts provided within our prototype.
Paper Structure (29 sections, 3 figures, 2 tables)

This paper contains 29 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Using analyze-json-logs.py to create an overview of different runs performed by OpenAI's O1/GPT-4o.
  • Figure 2: Using analyze-json-logs.py do detail the token-usage per used prompt of a single test-run. O1 reports reasoning-tokens as part of the completion tokens.
  • Figure 3: Using cochise-replay.py to perform the replay of a log-file. High-Level plans (create by the Planner are highlighted in green, tasks selected by the Planner and forwarded to the Executor are highlighted in yellow, low-level Executor tool-calls (executed commands) are not highlighted.