Table of Contents
Fetching ...

Identity Concealment Games: How I Learned to Stop Revealing and Love the Coincidences

Mustafa O. Karabag, Melkior Ornik, Ufuk Topcu

TL;DR

An algorithm is proposed that provably learns a near-optimal policy for the hostile player using the game runs collected under the average player's policy and an upper bound on the number of sample runs to be collected is given.

Abstract

In an adversarial environment, a hostile player performing a task may behave like a non-hostile one in order not to reveal its identity to an opponent. To model such a scenario, we define identity concealment games: zero-sum stochastic reachability games with a zero-sum objective of identity concealment. To measure the identity concealment of the player, we introduce the notion of an average player. The average player's policy represents the expected behavior of a non-hostile player. We show that there exists an equilibrium policy pair for every identity concealment game and give the optimality equations to synthesize an equilibrium policy pair. If the player's opponent follows a non-equilibrium policy, the player can hide its identity better. For this reason, we study how the hostile player may learn the opponent's policy. Since learning via exploration policies would quickly reveal the hostile player's identity to the opponent, we consider the problem of learning a near-optimal policy for the hostile player using the game runs collected under the average player's policy. Consequently, we propose an algorithm that provably learns a near-optimal policy and give an upper bound on the number of sample runs to be collected.

Identity Concealment Games: How I Learned to Stop Revealing and Love the Coincidences

TL;DR

An algorithm is proposed that provably learns a near-optimal policy for the hostile player using the game runs collected under the average player's policy and an upper bound on the number of sample runs to be collected is given.

Abstract

In an adversarial environment, a hostile player performing a task may behave like a non-hostile one in order not to reveal its identity to an opponent. To model such a scenario, we define identity concealment games: zero-sum stochastic reachability games with a zero-sum objective of identity concealment. To measure the identity concealment of the player, we introduce the notion of an average player. The average player's policy represents the expected behavior of a non-hostile player. We show that there exists an equilibrium policy pair for every identity concealment game and give the optimality equations to synthesize an equilibrium policy pair. If the player's opponent follows a non-equilibrium policy, the player can hide its identity better. For this reason, we study how the hostile player may learn the opponent's policy. Since learning via exploration policies would quickly reveal the hostile player's identity to the opponent, we consider the problem of learning a near-optimal policy for the hostile player using the game runs collected under the average player's policy. Consequently, we propose an algorithm that provably learns a near-optimal policy and give an upper bound on the number of sample runs to be collected.

Paper Structure

This paper contains 4 sections, 2 theorems, 10 equations, 2 figures, 1 table.

Key Result

Lemma 17

Let $\mathcal{D}$ be a discrete probability distribution such that $\mathcal{D}(n) \geq 0$ if $n \in \mathbb{N}$ and $\mathcal{D}(n) = 0$ otherwise, and let $c_{1}, c_{2} \in (0, \infty)$ be arbitrary constants. Define set $D$ such that $n \in D$ if and only if $\mathcal{D}(n) > c_{1} \exp(-n c_{2})

Figures (2)

  • Figure 3: Receiver operating characteristic curve of the likelihood ratio classifier that identifies hostile clients. True positive rate is the ratio of detected attackers to all attackers. False positive rate is the ratio of real clients identified as an attacker to all clients.
  • Figure 4: The value of the objective function and the probability of losing for different values of $m$ and $n$. The dashed line in (a) marks the value of the objective function for the optimal winning policy. The dashed line in (b) marks the probability of losing under the average player's policy.

Theorems & Definitions (9)

  • Lemma 17
  • Lemma 18
  • proof : Proof of Lemma \ref{['lemma:ultimatelemma']}
  • proof : Proof sketch for Lemma \ref{['lemma:empricalisclose']}
  • proof : Proof sketch for Lemma \ref{['lemma:closemdpcloseoutcome']}
  • proof : Proof of Lemma \ref{['lemma:lowsamplelowreachChernoffImproved']}
  • proof : Proof of Lemma \ref{['lemma:highreachhighcostImproved']}
  • proof : Proof of Lemma \ref{['lemma:divergentsubset']}
  • proof : Proof sketch for Lemma \ref{['lemma:lowvaluesareuseless']}