Adaptive Network Security Policies via Belief Aggregation and Rollout
Kim Hammar, Yuchao Li, Tansu Alpcan, Emil C. Lupu, Dimitri Bertsekas
TL;DR
The paper addresses the challenge of adapting network security policies under uncertainty and changing conditions by formulating policy adaptation as a partially observable decision problem. It introduces a scalable three-component framework: belief estimation via particle filtering, offline base policy computation through feature-based belief aggregation, and online policy adaptation via rollout with lookahead. The authors establish a bound on aggregation error, prove rollout-based adaptation improves policy quality under general conditions, and demonstrate state-of-the-art performance on the cage-2 benchmark as well as practical viability in a testbed. The approach offers fast, provable adaptive security with broad applicability to partially observable dynamic systems beyond security. Overall, the work provides a principled, scalable path to reliable automatic policy adaptation in responsive security infrastructures.
Abstract
Evolving security vulnerabilities and shifting operational conditions require frequent updates to network security policies. These updates include adjustments to incident response procedures and modifications to access controls, among others. Reinforcement learning methods have been proposed for automating such policy adaptations, but most of the methods in the research literature lack performance guarantees and adapt slowly to changes. In this paper, we address these limitations and present a method for computing security policies that is scalable, offers theoretical guarantees, and adapts quickly to changes. It assumes a model or simulator of the system and comprises three components: belief estimation through particle filtering, offline policy computation through aggregation, and online policy adaptation through rollout. Central to our method is a new feature-based aggregation technique, which improves scalability and flexibility. We analyze the approximation error of aggregation and show that rollout efficiently adapts policies to changes under certain conditions. Simulations and testbed results demonstrate that our method outperforms state-of-the-art methods on several benchmarks, including CAGE-2.
