Table of Contents
Fetching ...

Covert Adversarial Actuators in Finite MDPs

Edoardo David Santi, Gongpu Chen, Deniz Gündüz, Asaf Cohen

TL;DR

This paper models a finite MDP where an actuator may covertly override the controller's policy to reduce long-run rewards. It establishes that perfect covertness occurs exactly when the adversary preserves the controller's state-transition matrix, and otherwise detection becomes certain over an infinite horizon; a linear program identifies optimal covert policies under the perfect-covertness constraint. The analysis leverages large-deviation theory with the differential divergence $D_K$ to derive asymptotic error exponents for detection under both known and unknown adversarial policies, and it provides an anomaly-detection framework and optimization formulations that balance covert performance against reward degradation. Collectively, the results illuminate how covert adversaries can affect systemic performance in MDPs and offer tractable methods for designing policies and detectors with explicit performance guarantees, with potential implications for secure control and covert-communication scenarios.

Abstract

We consider a Markov decision process (MDP) in which actions prescribed by the controller are executed by a separate actuator, which may behave adversarially. At each time step, the controller selects and transmits an action to the actuator; however, the actuator may deviate from the intended action to degrade the control reward. Given that the controller observes only the sequence of visited states, we investigate whether the actuator can covertly deviate from the controller's policy to minimize its reward without being detected. We establish conditions for covert adversarial behavior over an infinite time horizon and formulate an optimization problem to determine the optimal adversarial policy under these conditions. Additionally, we derive the asymptotic error exponents for detection in two scenarios: (1) a binary hypothesis testing framework, where the actuator either follows the prescribed policy or a known adversarial strategy, and (2) a composite hypothesis testing framework, where the actuator may employ any stationary policy. For the latter case, we also propose an optimization problem to maximize the adversary's performance.

Covert Adversarial Actuators in Finite MDPs

TL;DR

This paper models a finite MDP where an actuator may covertly override the controller's policy to reduce long-run rewards. It establishes that perfect covertness occurs exactly when the adversary preserves the controller's state-transition matrix, and otherwise detection becomes certain over an infinite horizon; a linear program identifies optimal covert policies under the perfect-covertness constraint. The analysis leverages large-deviation theory with the differential divergence to derive asymptotic error exponents for detection under both known and unknown adversarial policies, and it provides an anomaly-detection framework and optimization formulations that balance covert performance against reward degradation. Collectively, the results illuminate how covert adversaries can affect systemic performance in MDPs and offer tractable methods for designing policies and detectors with explicit performance guarantees, with potential implications for secure control and covert-communication scenarios.

Abstract

We consider a Markov decision process (MDP) in which actions prescribed by the controller are executed by a separate actuator, which may behave adversarially. At each time step, the controller selects and transmits an action to the actuator; however, the actuator may deviate from the intended action to degrade the control reward. Given that the controller observes only the sequence of visited states, we investigate whether the actuator can covertly deviate from the controller's policy to minimize its reward without being detected. We establish conditions for covert adversarial behavior over an infinite time horizon and formulate an optimization problem to determine the optimal adversarial policy under these conditions. Additionally, we derive the asymptotic error exponents for detection in two scenarios: (1) a binary hypothesis testing framework, where the actuator either follows the prescribed policy or a known adversarial strategy, and (2) a composite hypothesis testing framework, where the actuator may employ any stationary policy. For the latter case, we also propose an optimization problem to maximize the adversary's performance.

Paper Structure

This paper contains 19 sections, 8 theorems, 35 equations, 1 figure.

Key Result

Theorem 1

For any stationary policy $\pi^{\text{adv}}\neq\pi^*$, $\pi^{\text{adv}}$ is 1-covert if $T^{\text{adv}}=T^*$ and 0-covert otherwise.

Figures (1)

  • Figure 1: System model. The controller wants the actuator to follow policy $\pi^*$; however, the compromised actuator instead aims to minimize the reward without being detected.

Theorems & Definitions (19)

  • Theorem 1
  • proof
  • Remark 1
  • Definition 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • proof
  • Definition 2
  • Theorem 5
  • ...and 9 more