Sensor Scheduling in Intrusion Detection Games with Uncertain Payoffs
Jayanth Bhargav, Shreyas Sundaram, Mahsa Ghasemi
TL;DR
This work addresses sensor scheduling for intrusion detection by casting it as a zero-sum matrix game on a graph, where the defender’s exponential joint-strategy space makes exact NE computation intractable. It introduces a Distributed Weighted Majority algorithm that exploits payoff structure to efficiently approximate Nash equilibria with convergence guarantees and substantially reduced computation compared to full enumeration or linear programming. The authors then extend to settings with unknown sensor models via online learning, deriving high-probability regret bounds under both homogeneous and heterogeneous sensor scenarios using bandit feedback and UCB techniques. Empirical results on grid-world environments demonstrate strong, scalable performance for both known and unknown payoff scenarios, highlighting practical applicability in real-time intrusion-detection systems.
Abstract
We study the problem of sensor scheduling for an intrusion detection task. We model this as a two-player zero-sum game over a graph, where the defender (Player 1) seeks to identify the optimal strategy for scheduling sensor orientations to minimize the probability of missed detection at minimal cost, while the intruder (Player 2) aims to identify the optimal path selection strategy to maximize missed detection probability at minimal cost. The defender's strategy space grows exponentially with the number of sensors, making direct computation of the Nash Equilibrium (NE) strategies computationally expensive. To tackle this, we propose a distributed variant of the Weighted Majority algorithm that exploits the structure of the game's payoff matrix, enabling efficient computation of the NE strategies with provable convergence guarantees. Next, we consider a more challenging scenario where the defender lacks knowledge of the true sensor models and, consequently, the game's payoff matrix. For this setting, we develop online learning algorithms that leverage bandit feedback from sensors to estimate the NE strategies. By building on existing results from perturbation theory and online learning in matrix games, we derive high-probability order-optimal regret bounds for our algorithms. Finally, through simulations, we demonstrate the empirical performance of our proposed algorithms in both known and unknown payoff scenarios.
