Table of Contents
Fetching ...

Evolutionary Defense: Advancing Moving Target Strategies with Bio-Inspired Reinforcement Learning to Secure Misconfigured Software Applications

Niloofar Heidarikohol, Shuvalaxmi Dass, Akbar Siami Namin

TL;DR

Security misconfiguration poses a critical risk in configurable software, motivating a proactive Moving Target Defense (MTD) built atop a Monte Carlo RL controller (RL-MTD). The authors identify a sparse, unoptimized search space as a bottleneck and mitigate it by integrating bio-inspired optimizers—Genetic Algorithm (GA) and Particle Swarm Optimization (PSO)—to produce GA-RL and PSO-RL. Across four misconfigured SUTs, both GA-RL and PSO-RL outperform the baseline RL-MTD, with PSO-RL typically delivering the best results, demonstrating the value of optimized search spaces for dynamic secure configurations. The work presents a proof-of-concept that augments MTD with evolutionary search to enhance defense against misconfigurations and points to future directions such as two-player dynamics and inter-application interactions.

Abstract

Improper configurations in software systems often create vulnerabilities, leaving them open to exploitation. Static architectures exacerbate this issue by allowing misconfigurations to persist, providing adversaries with opportunities to exploit them during attacks. To address this challenge, a dynamic proactive defense strategy known as Moving Target Defense (MTD) can be applied. MTD continually changes the attack surface of the system, thwarting potential threats. In the previous research, we developed a proof of concept for a single-player MTD game model called RL-MTD, which utilizes Reinforcement Learning (RL) to generate dynamic secure configurations. While the model exhibited satisfactory performance in generating secure configurations, it grappled with an unoptimized and sparse search space, leading to performance issues. To tackle this obstacle, this paper addresses the search space optimization problem by leveraging two bio-inspired search algorithms: Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). Additionally, we extend our base RL-MTD model by integrating these algorithms, resulting in the creation of PSO-RL andGA-RL. We compare the performance of three models: base RL-MTD, GA-RL, and PSO-RL, across four misconfigured SUTs in terms of generating the most secure configuration. Results show that the optimal search space derived from both GA-RL and PSO-RL significantly enhances the performance of the base RL-MTD model compared to the version without optimized search space. While both GA-RL and PSO-RL demonstrate effective search capabilities, PSO-RL slightly outperforms GA-RL for most SUTs. Overall, both algorithms excel in seeking an optimal search space which in turn improves the performance of the model in generating optimal secure configuration.

Evolutionary Defense: Advancing Moving Target Strategies with Bio-Inspired Reinforcement Learning to Secure Misconfigured Software Applications

TL;DR

Security misconfiguration poses a critical risk in configurable software, motivating a proactive Moving Target Defense (MTD) built atop a Monte Carlo RL controller (RL-MTD). The authors identify a sparse, unoptimized search space as a bottleneck and mitigate it by integrating bio-inspired optimizers—Genetic Algorithm (GA) and Particle Swarm Optimization (PSO)—to produce GA-RL and PSO-RL. Across four misconfigured SUTs, both GA-RL and PSO-RL outperform the baseline RL-MTD, with PSO-RL typically delivering the best results, demonstrating the value of optimized search spaces for dynamic secure configurations. The work presents a proof-of-concept that augments MTD with evolutionary search to enhance defense against misconfigurations and points to future directions such as two-player dynamics and inter-application interactions.

Abstract

Improper configurations in software systems often create vulnerabilities, leaving them open to exploitation. Static architectures exacerbate this issue by allowing misconfigurations to persist, providing adversaries with opportunities to exploit them during attacks. To address this challenge, a dynamic proactive defense strategy known as Moving Target Defense (MTD) can be applied. MTD continually changes the attack surface of the system, thwarting potential threats. In the previous research, we developed a proof of concept for a single-player MTD game model called RL-MTD, which utilizes Reinforcement Learning (RL) to generate dynamic secure configurations. While the model exhibited satisfactory performance in generating secure configurations, it grappled with an unoptimized and sparse search space, leading to performance issues. To tackle this obstacle, this paper addresses the search space optimization problem by leveraging two bio-inspired search algorithms: Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). Additionally, we extend our base RL-MTD model by integrating these algorithms, resulting in the creation of PSO-RL andGA-RL. We compare the performance of three models: base RL-MTD, GA-RL, and PSO-RL, across four misconfigured SUTs in terms of generating the most secure configuration. Results show that the optimal search space derived from both GA-RL and PSO-RL significantly enhances the performance of the base RL-MTD model compared to the version without optimized search space. While both GA-RL and PSO-RL demonstrate effective search capabilities, PSO-RL slightly outperforms GA-RL for most SUTs. Overall, both algorithms excel in seeking an optimal search space which in turn improves the performance of the model in generating optimal secure configuration.

Paper Structure

This paper contains 31 sections, 2 equations, 11 figures, 1 table, 10 algorithms.

Figures (11)

  • Figure 1: A subset of Windows 10 configuration parameters associated with default secure settings
  • Figure 2: (Left) RL elements: The environment is the misconfigured SUT's attack surface, an agent is the Monte-Carlo-based RL agent, the State represents the configuration (C) instance/state of the misconfigured SUT, actions taken by an agent are to either change a particular parameter setting or hold back and rewards are given based on the improvement(+,- or none) of configuration security score from its previous state. (Right) The MTD-RL agent interacts with a misconfigured SUT environment(eg Windows), where the state s is the current config (C) it is in, and it takes an action (0 or 1) based on which the SUT moves the agent to the next state s' and returns a reward (0,1,-1) based on the actions
  • Figure 3: This shows the snapshot of what an RL-MTD game looks like when played. Initially, it starts from an insecure state (Top-left), takes an action based on the config fitness score, gets a reward, and moves to the next intermediate steps (top-right). This series of operations is followed in every step until the agent reaches the near-optimal secure config finish state (bottom).
  • Figure 4: The execution flow of base RL-MTD model where Steps 1-4 make up the environment(), step 5 is generate episode() and step 6 is MC_prediction method ()
  • Figure 5: Left: RL-MTD Algo shows the diagrammatic view of all the important functions that use the search space/domain corresponding to a particular SUT. This search range is used by the agent to randomly draw settings from either during the initial config state or when action=1 is chosen. Right: shows the search space range for 2 types of parameter settings where the agent has to pick a setting from {v-lim,v+lim} if default setting(P) is a single integer value v and lim is a hyperparameter for a limit of int type; and if default setting(P) could be any value from a list consisting of permissible non-neg integers (v1,v2,v3,...) and/or 'None', it gets to choose either a numerical val from {0, max( (v1.v2,v3,.)+lim} or the string value 'None'. Issue: How to effectively choose value of lim that will optimize the search space range for better performance?
  • ...and 6 more figures