Minimax Least-Square Policy Iteration for Cost-Aware Defense of Traffic Routing against Unknown Threats

Yuzhen Zhan; Li Jin

Minimax Least-Square Policy Iteration for Cost-Aware Defense of Traffic Routing against Unknown Threats

Yuzhen Zhan, Li Jin

TL;DR

Addresses the problem of defending dynamic routing against unknown adversaries in parallel-queue networks. The authors extend least-squares policy iteration to a Markov security game with an attacker and defender, using linear function approximation and a minimax policy-improvement step to approximate a Markov perfect equilibrium. They derive a finite-sample bound on the value-function evaluation error, decomposed into projection error and two sampling errors, and propose the Minimax LSPI algorithm with convergence guarantees. The method yields threat-adaptive, cost-aware routing decisions without needing prior attacker policies, with potential impact on transportation, manufacturing, and data networks.

Abstract

Dynamic routing is one of the representative control scheme in transportation, production lines, and data transmission. In the modern context of connectivity and autonomy, routing decisions are potentially vulnerable to malicious attacks. In this paper, we consider the dynamic routing problem over parallel traffic links in the face of such threats. An attacker is capable of increasing or destabilizing traffic queues by strategic manipulating the nominally optimal routing decisions. A defender is capable of securing the correct routing decision. Attacking and defensive actions induce technological costs. The defender has no prior information about the attacker's strategy. We develop an least-square policy iteration algorithm for the defender to compute a cost-aware and threat-adaptive defensive strategy. The policy evaluation step computes a weight vector that minimizes the sampled temporal-difference error. We derive a concrete theoretical upper bound on the evaluation error based on the theory of value function approximation. The policy improvement step solves a minimax problem and thus iteratively computes the Markov perfect equilibrium of the security game. We also discuss the training error of the entire policy iteration process.

Minimax Least-Square Policy Iteration for Cost-Aware Defense of Traffic Routing against Unknown Threats

TL;DR

Abstract

Paper Structure (13 sections, 3 theorems, 61 equations, 3 figures, 1 algorithm)

This paper contains 13 sections, 3 theorems, 61 equations, 3 figures, 1 algorithm.

Introduction
Model formulation
System and player models
Markov security game
Function approximation
Policy Evaluation
Decomposition
Sampling error for true value function
Sampling error for approximate value function
Policy Iteration
Minimax LSPI
Discussion
Conclusion and future work

Key Result

Theorem 1

Consider the security game on a parallel service system with $m$ servers of buffer size $L$. Let $\lambda$ be the arrival rate, $\gamma$ be the discount rate, $c_b$ the the defending cost, respectively. Let $C_p$ be the constant and $W$ be the vector defined in Assumption 1. Suppose a sample set $\m where $\nu_{\min}$ is the smallest eigenvalue of $\frac{1}{n}\Phi^\top\Phi$.

Figures (3)

Figure 1: An $m$-queue system with shortest-queue routing under security failures.
Figure 2: Components used in proof of Theorem 1.
Figure 3: Components used in proof of Lemma 2

Theorems & Definitions (5)

Definition 1
Theorem 1
Lemma 1
Definition 2
Lemma 2

Minimax Least-Square Policy Iteration for Cost-Aware Defense of Traffic Routing against Unknown Threats

TL;DR

Abstract

Minimax Least-Square Policy Iteration for Cost-Aware Defense of Traffic Routing against Unknown Threats

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (5)