Multi-Agent Reinforcement Learning for Maritime Operational Technology Cyber Security

Alec Wilson; Ryan Menzies; Neela Morarji; David Foster; Marco Casassa Mont; Esin Turkbeyler; Lisa Gralewski

Multi-Agent Reinforcement Learning for Maritime Operational Technology Cyber Security

Alec Wilson, Ryan Menzies, Neela Morarji, David Foster, Marco Casassa Mont, Esin Turkbeyler, Lisa Gralewski

TL;DR

This work targets maritime OT cyber security by introducing IPMSRL, a configurable MARL environment that simulates an Integrated Platform Management System under cyber-attack. It compares two PPO-based methods, IPPO and MAPPO, showing that a MAPPO variant with a centralized critic achieves faster, more stable policy learning and superior defense performance, with optimal policies reached after $8\times 10^5$ timesteps compared to $10^6$ for IPPO. The simulation integrates MITRE ATT&CK ICS for attack realism, NIST SP-800-61-inspired remediation actions, and a reward structure that blends intrinsic and global signals to address sparse rewards; results reveal critical roles for alert quality and reward shaping in enabling effective autonomous defence. The findings highlight the potential and practical considerations for deploying autonomous cyber defence in OT on ships, while also noting sim-to-real gaps and the need for generalisable strategies across varied attack types and network topologies.

Abstract

This paper demonstrates the potential for autonomous cyber defence to be applied on industrial control systems and provides a baseline environment to further explore Multi-Agent Reinforcement Learning's (MARL) application to this problem domain. It introduces a simulation environment, IPMSRL, of a generic Integrated Platform Management System (IPMS) and explores the use of MARL for autonomous cyber defence decision-making on generic maritime based IPMS Operational Technology (OT). OT cyber defensive actions are less mature than they are for Enterprise IT. This is due to the relatively brittle nature of OT infrastructure originating from the use of legacy systems, design-time engineering assumptions, and lack of full-scale modern security controls. There are many obstacles to be tackled across the cyber landscape due to continually increasing cyber-attack sophistication and the limitations of traditional IT-centric cyber defence solutions. Traditional IT controls are rarely deployed on OT infrastructure, and where they are, some threats aren't fully addressed. In our experiments, a shared critic implementation of Multi Agent Proximal Policy Optimisation (MAPPO) outperformed Independent Proximal Policy Optimisation (IPPO). MAPPO reached an optimal policy (episode outcome mean of 1) after 800K timesteps, whereas IPPO was only able to reach an episode outcome mean of 0.966 after one million timesteps. Hyperparameter tuning greatly improved training performance. Across one million timesteps the tuned hyperparameters reached an optimal policy whereas the default hyperparameters only managed to win sporadically, with most simulations resulting in a draw. We tested a real-world constraint, attack detection alert success, and found that when alert success probability is reduced to 0.75 or 0.9, the MARL defenders were still able to win in over 97.5% or 99.5% of episodes, respectively.

Multi-Agent Reinforcement Learning for Maritime Operational Technology Cyber Security

TL;DR

timesteps compared to

for IPPO. The simulation integrates MITRE ATT&CK ICS for attack realism, NIST SP-800-61-inspired remediation actions, and a reward structure that blends intrinsic and global signals to address sparse rewards; results reveal critical roles for alert quality and reward shaping in enabling effective autonomous defence. The findings highlight the potential and practical considerations for deploying autonomous cyber defence in OT on ships, while also noting sim-to-real gaps and the need for generalisable strategies across varied attack types and network topologies.

Abstract

Paper Structure (16 sections, 7 figures, 1 table)

This paper contains 16 sections, 7 figures, 1 table.

Introduction
Operational Technology Scenario
IPMSRL - Simulation Environment
MITRE ATT&CK® Framework
Attacker
Alerts
NIST SP-800-61
Defensive Remedial Actions
Reward Function
Multi-Agent Reinforcement Learning (MARL)
Hyperparameter Tuning with MAPPO
IPPO vs MAPPO
Experimental Results
Alert Success Probability
Reward Experiments
...and 1 more sections

Figures (7)

Figure 1: Example of an IPMSRL Network Topology.
Figure 2: MITRE ATT&CK® ICS Tactics with the number of techniques for each tactic noauthor_mitre_2023.
Figure 3: Summary of NIST SP-800-61 Remedial Actions, adapted to targeted Maritime / OT scenario.
Figure 4: Comparison between default and tuned hyperparameters on MAPPO with 90% CI.
Figure 5: IPPO vs MAPPO Experiment with a CI of 90%
...and 2 more figures

Multi-Agent Reinforcement Learning for Maritime Operational Technology Cyber Security

TL;DR

Abstract

Multi-Agent Reinforcement Learning for Maritime Operational Technology Cyber Security

Authors

TL;DR

Abstract

Table of Contents

Figures (7)