Table of Contents
Fetching ...

AutoPentester: An LLM Agent-based Framework for Automated Pentesting

Yasod Ginige, Akila Niroshan, Sajal Jain, Suranga Seneviratne

TL;DR

This work tackles the need for scalable, automated pentesting by introducing AutoPentester, an LLM-agent framework that orchestrates end-to-end pentesting via five core agents plus supporting modules. It employs a Retrieval-Augmented Generation (RAG) approach for command synthesis, a Findings-driven Strategy Analyzer with a Pentest Tree for strategic planning, and two supporting components (Repetition Identifier and Results Verifier) to improve robustness, all interfacing with a CLI-based Agent Computer Interface. Evaluations on Hack The Box machines and custom VMs show substantial gains over the PentestGPT baseline, including a 27.0% increase in subtask completion and 39.5% higher vulnerability coverage with fewer steps and far less human input; qualitative feedback from security professionals further supports its practicality for enterprise red-team and assessment tasks. Limitations include challenges with GUI-heavy targets and knowledge-base maintenance, pointing to future work in GUI integration, broader tool support, and strategy-learning through reinforcement learning and fine-tuning of LLMs for pentesting planning.

Abstract

Penetration testing and vulnerability assessment are essential industry practices for safeguarding computer systems. As cyber threats grow in scale and complexity, the demand for pentesting has surged, surpassing the capacity of human professionals to meet it effectively. With advances in AI, particularly Large Language Models (LLMs), there have been attempts to automate the pentesting process. However, existing tools such as PentestGPT are still semi-manual, requiring significant professional human interaction to conduct pentests. To this end, we propose a novel LLM agent-based framework, AutoPentester, which automates the pentesting process. Given a target IP, AutoPentester automatically conducts pentesting steps using common security tools in an iterative process. It can dynamically generate attack strategies based on the tool outputs from the previous iteration, mimicking the human pentester approach. We evaluate AutoPentester using Hack The Box and custom-made VMs, comparing the results with the state-of-the-art PentestGPT. Results show that AutoPentester achieves a 27.0% better subtask completion rate and 39.5% more vulnerability coverage with fewer steps. Most importantly, it requires significantly fewer human interactions and interventions compared to PentestGPT. Furthermore, we recruit a group of security industry professional volunteers for a user survey and perform a qualitative analysis to evaluate AutoPentester against industry practices and compare it with PentestGPT. On average, AutoPentester received a score of 3.93 out of 5 based on user reviews, which was 19.8% higher than PentestGPT.

AutoPentester: An LLM Agent-based Framework for Automated Pentesting

TL;DR

This work tackles the need for scalable, automated pentesting by introducing AutoPentester, an LLM-agent framework that orchestrates end-to-end pentesting via five core agents plus supporting modules. It employs a Retrieval-Augmented Generation (RAG) approach for command synthesis, a Findings-driven Strategy Analyzer with a Pentest Tree for strategic planning, and two supporting components (Repetition Identifier and Results Verifier) to improve robustness, all interfacing with a CLI-based Agent Computer Interface. Evaluations on Hack The Box machines and custom VMs show substantial gains over the PentestGPT baseline, including a 27.0% increase in subtask completion and 39.5% higher vulnerability coverage with fewer steps and far less human input; qualitative feedback from security professionals further supports its practicality for enterprise red-team and assessment tasks. Limitations include challenges with GUI-heavy targets and knowledge-base maintenance, pointing to future work in GUI integration, broader tool support, and strategy-learning through reinforcement learning and fine-tuning of LLMs for pentesting planning.

Abstract

Penetration testing and vulnerability assessment are essential industry practices for safeguarding computer systems. As cyber threats grow in scale and complexity, the demand for pentesting has surged, surpassing the capacity of human professionals to meet it effectively. With advances in AI, particularly Large Language Models (LLMs), there have been attempts to automate the pentesting process. However, existing tools such as PentestGPT are still semi-manual, requiring significant professional human interaction to conduct pentests. To this end, we propose a novel LLM agent-based framework, AutoPentester, which automates the pentesting process. Given a target IP, AutoPentester automatically conducts pentesting steps using common security tools in an iterative process. It can dynamically generate attack strategies based on the tool outputs from the previous iteration, mimicking the human pentester approach. We evaluate AutoPentester using Hack The Box and custom-made VMs, comparing the results with the state-of-the-art PentestGPT. Results show that AutoPentester achieves a 27.0% better subtask completion rate and 39.5% more vulnerability coverage with fewer steps. Most importantly, it requires significantly fewer human interactions and interventions compared to PentestGPT. Furthermore, we recruit a group of security industry professional volunteers for a user survey and perform a qualitative analysis to evaluate AutoPentester against industry practices and compare it with PentestGPT. On average, AutoPentester received a score of 3.93 out of 5 based on user reviews, which was 19.8% higher than PentestGPT.

Paper Structure

This paper contains 22 sections, 7 figures, 8 tables.

Figures (7)

  • Figure 1: AutoPentester Framework (LLM icons indicate separate API sessions with an LLM).
  • Figure 2: An Example partial PTT (findings in step are progressively added as attributes).
  • Figure 3: The functionality of the Generator Agent.
  • Figure 4: Examples for each module’s functionality
  • Figure 5: Results of the user study.
  • ...and 2 more figures