Table of Contents
Fetching ...

Dual-Mandate Patrols: Multi-Armed Bandits for Green Security

Lily Xu, Elizabeth Bondi, Fei Fang, Andrew Perrault, Kai Wang, Milind Tambe

TL;DR

This paper addresses patrol planning in green security where defenders must balance immediate detection of poaching with gathering data to improve future predictions. It introduces LIZARD, a no-regret online learning algorithm that leverages reward decomposability and Lipschitz continuity to efficiently learn in a high-dimensional, continuous-action space with budgeted patrols. The authors prove regret bounds under fixed and adaptive discretization, showing improvements over existing Lipschitz and combinatorial bandit approaches, and demonstrate superior performance on real poaching data from Cambodia as well as synthetic scenarios. The work offers a principled, practically viable method for rapid deployment in conservation settings, enabling effective short-term protection while improving long-term predictive models.

Abstract

Conservation efforts in green security domains to protect wildlife and forests are constrained by the limited availability of defenders (i.e., patrollers), who must patrol vast areas to protect from attackers (e.g., poachers or illegal loggers). Defenders must choose how much time to spend in each region of the protected area, balancing exploration of infrequently visited regions and exploitation of known hotspots. We formulate the problem as a stochastic multi-armed bandit, where each action represents a patrol strategy, enabling us to guarantee the rate of convergence of the patrolling policy. However, a naive bandit approach would compromise short-term performance for long-term optimality, resulting in animals poached and forests destroyed. To speed up performance, we leverage smoothness in the reward function and decomposability of actions. We show a synergy between Lipschitz-continuity and decomposition as each aids the convergence of the other. In doing so, we bridge the gap between combinatorial and Lipschitz bandits, presenting a no-regret approach that tightens existing guarantees while optimizing for short-term performance. We demonstrate that our algorithm, LIZARD, improves performance on real-world poaching data from Cambodia.

Dual-Mandate Patrols: Multi-Armed Bandits for Green Security

TL;DR

This paper addresses patrol planning in green security where defenders must balance immediate detection of poaching with gathering data to improve future predictions. It introduces LIZARD, a no-regret online learning algorithm that leverages reward decomposability and Lipschitz continuity to efficiently learn in a high-dimensional, continuous-action space with budgeted patrols. The authors prove regret bounds under fixed and adaptive discretization, showing improvements over existing Lipschitz and combinatorial bandit approaches, and demonstrate superior performance on real poaching data from Cambodia as well as synthetic scenarios. The work offers a principled, practically viable method for rapid deployment in conservation settings, enabling effective short-term protection while improving long-term predictive models.

Abstract

Conservation efforts in green security domains to protect wildlife and forests are constrained by the limited availability of defenders (i.e., patrollers), who must patrol vast areas to protect from attackers (e.g., poachers or illegal loggers). Defenders must choose how much time to spend in each region of the protected area, balancing exploration of infrequently visited regions and exploitation of known hotspots. We formulate the problem as a stochastic multi-armed bandit, where each action represents a patrol strategy, enabling us to guarantee the rate of convergence of the patrolling policy. However, a naive bandit approach would compromise short-term performance for long-term optimality, resulting in animals poached and forests destroyed. To speed up performance, we leverage smoothness in the reward function and decomposability of actions. We show a synergy between Lipschitz-continuity and decomposition as each aids the convergence of the other. In doing so, we bridge the gap between combinatorial and Lipschitz bandits, presenting a no-regret approach that tightens existing guarantees while optimizing for short-term performance. We demonstrate that our algorithm, LIZARD, improves performance on real-world poaching data from Cambodia.

Paper Structure

This paper contains 27 sections, 10 theorems, 38 equations, 6 figures, 1 table, 2 algorithms.

Key Result

Theorem 2

Given the minimum discretization gap $\Delta$, number of arms $N$, Lipschitz constant $L$, and time horizon $T$, the regret bound of Algorithm alg:decomposed with $\textsc{selfUCB}$ is

Figures (6)

  • Figure 1: Rangers searching for snares (right) near a waterhole (left) in Srepok Wildlife Sanctuary in Cambodia. The waterhole is frequented by deer, pig, and bison, which are targeted by poachers.
  • Figure 2: Naively protecting 10 out of 100 potential poaching targets based on our predictions would require 6 years of data to accurately protect the most important 10 targets.
  • Figure 3: The Lipschitz assumption enables us to prune confidence bounds. We show the impact of each $\textsc{selfUCB}$s on the $\textsc{UCB}$s of other arms in effort space of target $i$. The solid brackets represent the $\textsc{selfUCB}$s. The dashed lines represent the bounds imposed by each arm on the rest of the space. The shaded green region covers the potential value of the reward function at different levels of effort. We visualize the additive effect of (a) Lipschitz-continuity, (b) zero effort yields zero reward, and (c) monotonicity. Note that these plots demonstrate $\textsc{UCB}$s for one target and that Lipschitz continuity also applies across targets based on feature similarity.
  • Figure 4: Map of Srepok with a $5 \times 5$ km region highlighted and the real-world reward functions of the corresponding 25 targets.
  • Figure 5: Performance, measured in terms of percentage of reward achieved between $\textsc{optimal} - \textsc{exploit}$, over time. Shaded region shows standard error. Setting shown is $N = 25$, $B=1$. LIZARD (green) performs best.
  • ...and 1 more figures

Theorems & Definitions (18)

  • Example 1
  • Theorem 2
  • proof : Proof sketch
  • Theorem 3
  • proof : Proof sketch
  • Theorem 4
  • Theorem 5
  • Corollary 5
  • Theorem 5
  • proof
  • ...and 8 more