Out-of-Distribution Detection for Neurosymbolic Autonomous Cyber Agents
Ankita Samaddar, Nicholas Potteiger, Xenofon Koutsoukos
TL;DR
The paper addresses the challenge of ensuring trustworthiness for RL-based autonomous cyber-defense agents operating under uncertain runtime conditions. It introduces an OOD Monitoring framework that uses a Probabilistic Neural Network to learn discrete system dynamics and detect out-of-distribution transitions, integrating this safety layer with a neurosymbolic agent built on Evolving Behavior Trees. Through CybORG Scenario 2 experiments with adversarial strategies Meander and B_line, the approach demonstrates effective OOD detection and safe handling of strategy switches, including the need for a GetSafeAction! node to restore to safe states. Overall, the work provides a practical runtime safety mechanism for discrete-state, neurosymbolic cyber defenses and outlines avenues for real-world emulation and online learning of adversaries.
Abstract
Autonomous agents for cyber applications take advantage of modern defense techniques by adopting intelligent agents with conventional and learning-enabled components. These intelligent agents are trained via reinforcement learning (RL) algorithms, and can learn, adapt to, reason about and deploy security rules to defend networked computer systems while maintaining critical operational workflows. However, the knowledge available during training about the state of the operational network and its environment may be limited. The agents should be trustworthy so that they can reliably detect situations they cannot handle, and hand them over to cyber experts. In this work, we develop an out-of-distribution (OOD) Monitoring algorithm that uses a Probabilistic Neural Network (PNN) to detect anomalous or OOD situations of RL-based agents with discrete states and discrete actions. To demonstrate the effectiveness of the proposed approach, we integrate the OOD monitoring algorithm with a neurosymbolic autonomous cyber agent that uses behavior trees with learning-enabled components. We evaluate the proposed approach in a simulated cyber environment under different adversarial strategies. Experimental results over a large number of episodes illustrate the overall efficiency of our proposed approach.
