Table of Contents
Fetching ...

Adaptive Mitigation of Insider Threats via Off-Policy Learning

Gehui Xu, Kaiwen Chen, Zhong-Ping Jiang, Thomas Parisini, Andreas A. Malikopoulos

Abstract

An insider is a team member who covertly deviates from the team's optimal collaborative strategy to pursue a private objective while still appearing cooperative. Such an insider may initially behave cooperatively but later switch to selfish or malicious actions, thereby degrading collective performance, threatening mission success, and compromising operational safety. In this paper, we study such insider threats within an insider-aware, game-theoretic formulation, where the insider interacts with a decision maker (DM) under a continuous-time switched system, with each time interval characterized by a distinct insider behavioral pattern or threat level. We develop a periodic off-policy mitigation scheme that enables the DM to learn optimal mitigation policies from online data when encountering different insider threats, without requiring a priori knowledge of insider intentions. By designing appropriate conditions on the inter-learning interval time, we establish convergence guarantees for both the learning process and the closed-loop system, and characterize the corresponding mitigation performance achieved by the DM.

Adaptive Mitigation of Insider Threats via Off-Policy Learning

Abstract

An insider is a team member who covertly deviates from the team's optimal collaborative strategy to pursue a private objective while still appearing cooperative. Such an insider may initially behave cooperatively but later switch to selfish or malicious actions, thereby degrading collective performance, threatening mission success, and compromising operational safety. In this paper, we study such insider threats within an insider-aware, game-theoretic formulation, where the insider interacts with a decision maker (DM) under a continuous-time switched system, with each time interval characterized by a distinct insider behavioral pattern or threat level. We develop a periodic off-policy mitigation scheme that enables the DM to learn optimal mitigation policies from online data when encountering different insider threats, without requiring a priori knowledge of insider intentions. By designing appropriate conditions on the inter-learning interval time, we establish convergence guarantees for both the learning process and the closed-loop system, and characterize the corresponding mitigation performance achieved by the DM.

Paper Structure

This paper contains 10 sections, 6 theorems, 40 equations, 3 figures, 1 algorithm.

Key Result

Lemma 1

Let $\mathcal{K}^0 \in \mathbb{R}^{m \times q}$ be any stabilizing feedback gain matrix, i.e., $\mathcal{A} - \mathcal{B}\mathcal{K}^0$ is Hurwitz. For $k = 0,1,\ldots$, let $\mathcal{P}^k \in \mathbb{R}^{q \times q}$ be the real symmetric positive definite solution of the Lyapunov equation where $\mathcal{K}^{k}$ is computed recursively by Then, the following properties hold:

Figures (3)

  • Figure 3: Periodical mitigation scheme
  • Figure 4: State trajectories switching between three operating modes.
  • Figure 5: Convergence performance of system state. Top: results obtained using fully clean data. Bottom: results obtained using partially mixed data.

Theorems & Definitions (9)

  • Definition 1
  • Lemma 1: kleinman1968iterative
  • Lemma 2: jiang2012computational
  • Lemma 3
  • Lemma 4
  • proof
  • Theorem 1
  • proof
  • Corollary 1