Evaluation of Reinforcement Learning for Autonomous Penetration Testing using A3C, Q-learning and DQN
Norman Becker, Daniel Reti, Evridiki V. Ntagiou, Marcus Wallum, Hans D. Schotten
TL;DR
The paper investigates autonomous penetration testing using reinforcement learning by comparing Q-learning, DQN, and A3C within an extended NASim environment. It introduces three attack scenarios—Exploits, Wiretapping, and Post-exploitation—and a Penbox baseline, then conducts a large hyperparameter grid search to identify effective agents. Results show that A3C solves all scenarios and generalizes to unseen permutations, outperforming the baseline in action efficiency, while DQN fails on multi-environment tasks and Q-learning struggles beyond Stage1. The study demonstrates the viability of RL-based autonomous pentesting in simplified, discrete environments and discusses limitations around generalization, overfitting, and real-world applicability, pointing to future work on scenario generation and more diverse topologies. Overall, the work provides a promising foundation for scalable, automated security testing with implications for faster vulnerability assessment and red-teaming support.
Abstract
Penetration testing is the process of searching for security weaknesses by simulating an attack. It is usually performed by experienced professionals, where scanning and attack tools are applied. By automating the execution of such tools, the need for human interaction and decision-making could be reduced. In this work, a Network Attack Simulator (NASim) was used as an environment to train reinforcement learning agents to solve three predefined security scenarios. These scenarios cover techniques of exploitation, post-exploitation and wiretapping. A large hyperparameter grid search was performed to find the best hyperparameter combinations. The algorithms Q-learning, DQN and A3C were used, whereby A3C was able to solve all scenarios and achieve generalization. In addition, A3C could solve these scenarios with fewer actions than the baseline automated penetration testing. Although the training was performed on rather small scenarios and with small state and action spaces for the agents, the results show that a penetration test can successfully be performed by the RL agent.
