Can LLMs Hack Enterprise Networks? -- Replicated Computational Results (RCR) Report

Andreas Happe; Jürgen Cito

Can LLMs Hack Enterprise Networks? -- Replicated Computational Results (RCR) Report

Andreas Happe, Jürgen Cito

TL;DR

This RCR report describes the artifacts used in the paper, how to create an evaluation setup, and highlights the analysis scripts provided within the prototype.

Abstract

This is the Replicated Computational Results (RCR) Report for the paper ``Can LLMs Hack Enterprise Networks?" The paper empirically investigates the efficacy and effectiveness of different LLMs for penetration-testing enterprise networks, i.e., Microsoft Active Directory Assumed-Breach Simulations. This RCR report describes the artifacts used in the paper, how to create an evaluation setup, and highlights the analysis scripts provided within our prototype.

Can LLMs Hack Enterprise Networks? -- Replicated Computational Results (RCR) Report

TL;DR

This RCR report describes the artifacts used in the paper, how to create an evaluation setup, and highlights the analysis scripts provided within the prototype.

Abstract

Paper Structure (29 sections, 3 figures, 2 tables)

This paper contains 29 sections, 3 figures, 2 tables.

Overview
Paper Motivation
Paper Contribution
Overview of the Replication Process
Artifacts
Prerequisites and Requirements
Hardware
Virtualization Infrastructure
GOAD Setup
Kali Linux Virtual Machine Setup
LLM Infrastructure
Setup the Cochise Prototype
Docker-based Installation
Manual Installation
Data Generation and Analysis
...and 14 more sections

Figures (3)

Figure 1: Using analyze-json-logs.py to create an overview of different runs performed by OpenAI's O1/GPT-4o.
Figure 2: Using analyze-json-logs.py do detail the token-usage per used prompt of a single test-run. O1 reports reasoning-tokens as part of the completion tokens.
Figure 3: Using cochise-replay.py to perform the replay of a log-file. High-Level plans (create by the Planner are highlighted in green, tasks selected by the Planner and forwarded to the Executor are highlighted in yellow, low-level Executor tool-calls (executed commands) are not highlighted.

Can LLMs Hack Enterprise Networks? -- Replicated Computational Results (RCR) Report

TL;DR

Abstract

Can LLMs Hack Enterprise Networks? -- Replicated Computational Results (RCR) Report

Authors

TL;DR

Abstract

Table of Contents

Figures (3)