Knowledge-Informed Auto-Penetration Testing Based on Reinforcement Learning with Reward Machine

Yuanliang Li; Hanzheng Dai; Jun Yan

Knowledge-Informed Auto-Penetration Testing Based on Reinforcement Learning with Reward Machine

Yuanliang Li, Hanzheng Dai, Jun Yan

TL;DR

The paper tackles automated penetration testing (AutoPT) using reinforcement learning guided by domain knowledge expressed as Reward Machines. It introduces DRLRM-PT, which formulates PT as a POMDP and uses two RM designs to decompose lateral-movement tasks and assign phase-specific rewards, optimized with the Deep Q-learning with RM (DQRM) approach. Empirical evaluation on CyberBattleSim shows that RM-guided agents learn faster and achieve better PT efficiency than baselines, with richer RS knowledge (RM2) yielding the best performance. The work demonstrates that integrating cybersecurity knowledge bases into RL can enhance sample efficiency, interpretability, and outcome quality, with practical implications for scalable, automated security testing.

Abstract

Automated penetration testing (AutoPT) based on reinforcement learning (RL) has proven its ability to improve the efficiency of vulnerability identification in information systems. However, RL-based PT encounters several challenges, including poor sampling efficiency, intricate reward specification, and limited interpretability. To address these issues, we propose a knowledge-informed AutoPT framework called DRLRM-PT, which leverages reward machines (RMs) to encode domain knowledge as guidelines for training a PT policy. In our study, we specifically focus on lateral movement as a PT case study and formulate it as a partially observable Markov decision process (POMDP) guided by RMs. We design two RMs based on the MITRE ATT\&CK knowledge base for lateral movement. To solve the POMDP and optimize the PT policy, we employ the deep Q-learning algorithm with RM (DQRM). The experimental results demonstrate that the DQRM agent exhibits higher training efficiency in PT compared to agents without knowledge embedding. Moreover, RMs encoding more detailed domain knowledge demonstrated better PT performance compared to RMs with simpler knowledge.

Knowledge-Informed Auto-Penetration Testing Based on Reinforcement Learning with Reward Machine

TL;DR

Abstract

Paper Structure (21 sections, 3 equations, 9 figures, 5 tables, 1 algorithm)

This paper contains 21 sections, 3 equations, 9 figures, 5 tables, 1 algorithm.

Introduction
Background of AI-Powered AutoPT
Challenges of AI-Powered AutoPT
Contributions of This Work
Knowledge-Informed AutoPT Framework
DRLRM-PT Framework
POMDP with Reward Machine Formulation for PT
POMDP with RM Design for Lateral Movement
Action Space
Observation Space
Reward Machine I ($\mathcal{R}_1$)
Reward Machine II ($\mathcal{R}_2$)
PT Objective
PT policy optimization based on DQRM
Simulation Platform and Testing Environments
...and 6 more sections

Figures (9)

Figure 1: The proposed knowledge-informed AutoPT framework (DRLRM-PT).
Figure 2: The diagram of Reward Machine I ($\mathcal{R}_1$).
Figure 3: The diagram of Reward Machine II ($\mathcal{R}_2$).
Figure 4: CyberBattleChain environment (env-1).
Figure 5: CyberBattleToyCtf environment (env-2).
...and 4 more figures

Knowledge-Informed Auto-Penetration Testing Based on Reinforcement Learning with Reward Machine

TL;DR

Abstract

Knowledge-Informed Auto-Penetration Testing Based on Reinforcement Learning with Reward Machine

Authors

TL;DR

Abstract

Table of Contents

Figures (9)