Information-Theoretic Opacity-Enforcement in Markov Decision Processes

Chongyang Shi; Yuheng Bu; Jie Fu

Information-Theoretic Opacity-Enforcement in Markov Decision Processes

Chongyang Shi, Yuheng Bu, Jie Fu

TL;DR

The paper develops primal-dual policy gradient methods for opacity-enforcement planning subject to constraints on total returns and proposes novel algorithms to compute the policy gradient of entropy for each observation, leveraging message passing within the hidden Markov models.

Abstract

The paper studies information-theoretic opacity, an information-flow privacy property, in a setting involving two agents: A planning agent who controls a stochastic system and an observer who partially observes the system states. The goal of the observer is to infer some secret, represented by a random variable, from its partial observations, while the goal of the planning agent is to make the secret maximally opaque to the observer while achieving a satisfactory total return. Modeling the stochastic system using a Markov decision process, two classes of opacity properties are considered -- Last-state opacity is to ensure that the observer is uncertain if the last state is in a specific set and initial-state opacity is to ensure that the observer is unsure of the realization of the initial state. As the measure of opacity, we employ the Shannon conditional entropy capturing the information about the secret revealed by the observable. Then, we develop primal-dual policy gradient methods for opacity-enforcement planning subject to constraints on total returns. We propose novel algorithms to compute the policy gradient of entropy for each observation, leveraging message passing within the hidden Markov models. This gradient computation enables us to have stable and fast convergence. We demonstrate our solution of opacity-enforcement control through a grid world example.

Information-Theoretic Opacity-Enforcement in Markov Decision Processes

TL;DR

Abstract

Paper Structure (10 sections, 38 equations, 3 figures)

This paper contains 10 sections, 38 equations, 3 figures.

Introduction
Preliminary and Problem Formulation
Preliminaries
Problem Statement
Synthesizing Maximally Opacity-Enforcement Controllers For Last-state Opacity
Primal-Dual Policy Gradient for Constrained Minimal Information Leakage
Computing the Gradient of Conditional Entropy
Synthesizing Maximally Opacity-Enforcement Controllers For Initial-state Opacity
Experiment Evaluation
Conclusion and Future Work

Figures (3)

Figure 1: The red robot is P1 (the agent). P1 can move in four compass directions (north, south, east, west) or remain stationary. However, the dynamics of movement are stochastic. When the robot moves in a specific direction, there is a 0.1 probability that it will also move in the nearest two directions. For instance, if the robot moves east, there is a 0.1 probability of it moving north and a 0.1 probability of it moving south, as illustrated in the image. If the robot hits the boundary, it stays put.
Figure 2: The result of the primal-dual policy gradient algorithm. The blue line represents the opacity and the red line represents the estimated total return.
Figure 3: Comparison with baseline. The dashed lines are the entropies and values from our method. The colorful lines are entropies and values from the baseline method.

Theorems & Definitions (3)

Definition 1: Observation function of P2
Definition 2
Example 1: Grid World Example

Information-Theoretic Opacity-Enforcement in Markov Decision Processes

TL;DR

Abstract

Information-Theoretic Opacity-Enforcement in Markov Decision Processes

Authors

TL;DR

Abstract

Table of Contents

Figures (3)

Theorems & Definitions (3)