Evolutionary Defense: Advancing Moving Target Strategies with Bio-Inspired Reinforcement Learning to Secure Misconfigured Software Applications
Niloofar Heidarikohol, Shuvalaxmi Dass, Akbar Siami Namin
TL;DR
Security misconfiguration poses a critical risk in configurable software, motivating a proactive Moving Target Defense (MTD) built atop a Monte Carlo RL controller (RL-MTD). The authors identify a sparse, unoptimized search space as a bottleneck and mitigate it by integrating bio-inspired optimizers—Genetic Algorithm (GA) and Particle Swarm Optimization (PSO)—to produce GA-RL and PSO-RL. Across four misconfigured SUTs, both GA-RL and PSO-RL outperform the baseline RL-MTD, with PSO-RL typically delivering the best results, demonstrating the value of optimized search spaces for dynamic secure configurations. The work presents a proof-of-concept that augments MTD with evolutionary search to enhance defense against misconfigurations and points to future directions such as two-player dynamics and inter-application interactions.
Abstract
Improper configurations in software systems often create vulnerabilities, leaving them open to exploitation. Static architectures exacerbate this issue by allowing misconfigurations to persist, providing adversaries with opportunities to exploit them during attacks. To address this challenge, a dynamic proactive defense strategy known as Moving Target Defense (MTD) can be applied. MTD continually changes the attack surface of the system, thwarting potential threats. In the previous research, we developed a proof of concept for a single-player MTD game model called RL-MTD, which utilizes Reinforcement Learning (RL) to generate dynamic secure configurations. While the model exhibited satisfactory performance in generating secure configurations, it grappled with an unoptimized and sparse search space, leading to performance issues. To tackle this obstacle, this paper addresses the search space optimization problem by leveraging two bio-inspired search algorithms: Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). Additionally, we extend our base RL-MTD model by integrating these algorithms, resulting in the creation of PSO-RL andGA-RL. We compare the performance of three models: base RL-MTD, GA-RL, and PSO-RL, across four misconfigured SUTs in terms of generating the most secure configuration. Results show that the optimal search space derived from both GA-RL and PSO-RL significantly enhances the performance of the base RL-MTD model compared to the version without optimized search space. While both GA-RL and PSO-RL demonstrate effective search capabilities, PSO-RL slightly outperforms GA-RL for most SUTs. Overall, both algorithms excel in seeking an optimal search space which in turn improves the performance of the model in generating optimal secure configuration.
