Table of Contents
Fetching ...

Adversarial Reinforcement Learning for Detecting False Data Injection Attacks in Vehicular Routing

Taha Eghtesad, Yevgeniy Vorobeychik, Aron Laszka

TL;DR

This work proposes a computational method based on multi-agent reinforcement learning to compute a Nash equilibrium of this game, providing an optimal detection strategy, which ensures that total travel time remains within a worst-case bound, even in the presence of an attack.

Abstract

In modern transportation networks, adversaries can manipulate routing algorithms using false data injection attacks, such as simulating heavy traffic with multiple devices running crowdsourced navigation applications, to mislead vehicles toward suboptimal routes and increase congestion. To address these threats, we formulate a strategically zero-sum game between an attacker, who injects such perturbations, and a defender, who detects anomalies based on the observed travel times of network edges. We propose a computational method based on multi-agent reinforcement learning to compute a Nash equilibrium of this game, providing an optimal detection strategy, which ensures that total travel time remains within a worst-case bound, even in the presence of an attack. We present an extensive experimental evaluation that demonstrates the robustness and practical benefits of our approach, providing a powerful framework to improve the resilience of transportation networks against false data injection. In particular, we show that our approach yields approximate equilibrium policies and significantly outperforms baselines for both the attacker and the defender.

Adversarial Reinforcement Learning for Detecting False Data Injection Attacks in Vehicular Routing

TL;DR

This work proposes a computational method based on multi-agent reinforcement learning to compute a Nash equilibrium of this game, providing an optimal detection strategy, which ensures that total travel time remains within a worst-case bound, even in the presence of an attack.

Abstract

In modern transportation networks, adversaries can manipulate routing algorithms using false data injection attacks, such as simulating heavy traffic with multiple devices running crowdsourced navigation applications, to mislead vehicles toward suboptimal routes and increase congestion. To address these threats, we formulate a strategically zero-sum game between an attacker, who injects such perturbations, and a defender, who detects anomalies based on the observed travel times of network edges. We propose a computational method based on multi-agent reinforcement learning to compute a Nash equilibrium of this game, providing an optimal detection strategy, which ensures that total travel time remains within a worst-case bound, even in the presence of an attack. We present an extensive experimental evaluation that demonstrates the robustness and practical benefits of our approach, providing a powerful framework to improve the resilience of transportation networks against false data injection. In particular, we show that our approach yields approximate equilibrium policies and significantly outperforms baselines for both the attacker and the defender.
Paper Structure (32 sections, 17 equations, 2 figures, 1 table)

This paper contains 32 sections, 17 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Dividing a two-player game environment into two single-agent reinforcement learnings for the PSRO algorithm: one for player $p$ and one for its opponent $\bar{p}$, where $\bar{p}$ plays with MSNE $\sigma_{\bar{p}}^*$ and $p$ plays with $\sigma_p^*$, respectively.
  • Figure 2: Experimental evaluation and comparison of our approachto alternative strategies. First, we compare various baseline attack strategiesto our equilibrium solutions; here, a higher total travel time would indicate a more effective attack (higher is better). Second, we compare various defense strategiesto our equilibrium solutions; here, a lower total travel time would indicate a more effective defense (lower is better). The results show that our approachoutperforms all of the alternatives, inducing higher travel times than other attacks and securing lower travel times than other defenses. The results demonstrate that our approachoutperforms the alternatives in both roles; crucially, our equilibrium-based defender is robust against these alternative attacks without prior training on them, which demonstrates the success of our algorithm.