Evaluation of Reinforcement Learning for Autonomous Penetration Testing using A3C, Q-learning and DQN

Norman Becker; Daniel Reti; Evridiki V. Ntagiou; Marcus Wallum; Hans D. Schotten

Evaluation of Reinforcement Learning for Autonomous Penetration Testing using A3C, Q-learning and DQN

Norman Becker, Daniel Reti, Evridiki V. Ntagiou, Marcus Wallum, Hans D. Schotten

TL;DR

The paper investigates autonomous penetration testing using reinforcement learning by comparing Q-learning, DQN, and A3C within an extended NASim environment. It introduces three attack scenarios—Exploits, Wiretapping, and Post-exploitation—and a Penbox baseline, then conducts a large hyperparameter grid search to identify effective agents. Results show that A3C solves all scenarios and generalizes to unseen permutations, outperforming the baseline in action efficiency, while DQN fails on multi-environment tasks and Q-learning struggles beyond Stage1. The study demonstrates the viability of RL-based autonomous pentesting in simplified, discrete environments and discusses limitations around generalization, overfitting, and real-world applicability, pointing to future work on scenario generation and more diverse topologies. Overall, the work provides a promising foundation for scalable, automated security testing with implications for faster vulnerability assessment and red-teaming support.

Abstract

Penetration testing is the process of searching for security weaknesses by simulating an attack. It is usually performed by experienced professionals, where scanning and attack tools are applied. By automating the execution of such tools, the need for human interaction and decision-making could be reduced. In this work, a Network Attack Simulator (NASim) was used as an environment to train reinforcement learning agents to solve three predefined security scenarios. These scenarios cover techniques of exploitation, post-exploitation and wiretapping. A large hyperparameter grid search was performed to find the best hyperparameter combinations. The algorithms Q-learning, DQN and A3C were used, whereby A3C was able to solve all scenarios and achieve generalization. In addition, A3C could solve these scenarios with fewer actions than the baseline automated penetration testing. Although the training was performed on rather small scenarios and with small state and action spaces for the agents, the results show that a penetration test can successfully be performed by the RL agent.

Evaluation of Reinforcement Learning for Autonomous Penetration Testing using A3C, Q-learning and DQN

TL;DR

Abstract

Paper Structure (29 sections, 1 equation, 8 figures, 5 tables)

This paper contains 29 sections, 1 equation, 8 figures, 5 tables.

Introduction
Penetration Testing
Fundamentals
Scope and Phases
Automation of Pentesting
Penbox
Related Work
Experiment
Scenarios
Exploits
Wiretapping
Post-exploitation
Penbox
NASim and applied modifications
Reinforcement Learning Algorithms
...and 14 more sections

Figures (8)

Figure 1: Scenario A: Exploits. Red-marked services can be exploited to achieve root access.
Figure 2: Scenario B: Wiretapping. Red-marked GET requests transmit credentials in clear text
Figure 3: Scenario C: Post-exploitation. Red-marked services can be exploited; afterwards, credentials are found because of a weak hash.
Figure 4: Penbox execution chain
Figure 5: DQN agent trained on one permutation of Scenario A, with the following Hyperparameter: q_func: ['FCStateQFunctionWithDiscreteAction', 2, 50] optimizer: ['adam', 0.0001] replay_buffer: 1000 gamma: 0.7 explorer ['LinearDecayEpsilonGreedy', 0.1, 60000] target_update_intervall: 1000 update_intervall: 1 replay_start_size: 1000.
...and 3 more figures

Evaluation of Reinforcement Learning for Autonomous Penetration Testing using A3C, Q-learning and DQN

TL;DR

Abstract

Evaluation of Reinforcement Learning for Autonomous Penetration Testing using A3C, Q-learning and DQN

Authors

TL;DR

Abstract

Table of Contents

Figures (8)