Table of Contents
Fetching ...

Multi-Agent Reinforcement Learning for Joint Police Patrol and Dispatch

Matthew Repasky, He Wang, Yao Xie

TL;DR

This work addresses the critical problem of reducing police emergency response times by jointly optimizing patrol and dispatch decisions. It introduces a novel MARL framework where each patroller is an independent Q-learner sharing a deep Q-network, while a dispatcher uses a mixed-integer programming (MIP) approach guided by value-function approximations to assign patrollers to incidents. A coordinate-descent-like alternating optimization links patrol and dispatch policies, leading to joint policies that outperform baselines optimized for either task alone, across simulated scenarios and a Southwest Atlanta data-inspired setting. The results demonstrate faster response times and improved fairness metrics, signaling practical implications for resource-constrained police operations that must balance efficiency and equity.

Abstract

Police patrol units need to split their time between performing preventive patrol and being dispatched to serve emergency incidents. In the existing literature, patrol and dispatch decisions are often studied separately. We consider joint optimization of these two decisions to improve police operations efficiency and reduce response time to emergency calls. Methodology/results: We propose a novel method for jointly optimizing multi-agent patrol and dispatch to learn policies yielding rapid response times. Our method treats each patroller as an independent Q-learner (agent) with a shared deep Q-network that represents the state-action values. The dispatching decisions are chosen using mixed-integer programming and value function approximation from combinatorial action spaces. We demonstrate that this heterogeneous multi-agent reinforcement learning approach is capable of learning joint policies that outperform those optimized for patrol or dispatch alone. Managerial Implications: Policies jointly optimized for patrol and dispatch can lead to more effective service while targeting demonstrably flexible objectives, such as those encouraging efficiency and equity in response.

Multi-Agent Reinforcement Learning for Joint Police Patrol and Dispatch

TL;DR

This work addresses the critical problem of reducing police emergency response times by jointly optimizing patrol and dispatch decisions. It introduces a novel MARL framework where each patroller is an independent Q-learner sharing a deep Q-network, while a dispatcher uses a mixed-integer programming (MIP) approach guided by value-function approximations to assign patrollers to incidents. A coordinate-descent-like alternating optimization links patrol and dispatch policies, leading to joint policies that outperform baselines optimized for either task alone, across simulated scenarios and a Southwest Atlanta data-inspired setting. The results demonstrate faster response times and improved fairness metrics, signaling practical implications for resource-constrained police operations that must balance efficiency and equity.

Abstract

Police patrol units need to split their time between performing preventive patrol and being dispatched to serve emergency incidents. In the existing literature, patrol and dispatch decisions are often studied separately. We consider joint optimization of these two decisions to improve police operations efficiency and reduce response time to emergency calls. Methodology/results: We propose a novel method for jointly optimizing multi-agent patrol and dispatch to learn policies yielding rapid response times. Our method treats each patroller as an independent Q-learner (agent) with a shared deep Q-network that represents the state-action values. The dispatching decisions are chosen using mixed-integer programming and value function approximation from combinatorial action spaces. We demonstrate that this heterogeneous multi-agent reinforcement learning approach is capable of learning joint policies that outperform those optimized for patrol or dispatch alone. Managerial Implications: Policies jointly optimized for patrol and dispatch can lead to more effective service while targeting demonstrably flexible objectives, such as those encouraging efficiency and equity in response.
Paper Structure (43 sections, 17 equations, 12 figures, 3 tables)

This paper contains 43 sections, 17 equations, 12 figures, 3 tables.

Figures (12)

  • Figure 1: The joint task of patrol and dispatch involves decisions such as that demonstrated above. It is unclear whether it is best to move one patroller across beat boundaries to manage an incident, leaving its beat un-managed, or to require the patroller responsible for that beat to travel a long distance to the scene.
  • Figure 2: Example of a graph with two beats indicated by color. The dotted edges indicate that patrollers can travel between beats only when responding to calls.
  • Figure 3: Loss of the dispatch and patrol value networks over the course of alternating joint optimization in the high call volume setting (Section \ref{['sec:efficient_response_experiment']}).
  • Figure 4: Response time distributions (a) for all incidents and (b)-(c) for each incident category in the high call volume setting of Section \ref{['sec:efficient_response_experiment']}.
  • Figure 5: Response time distributions (a) for all incidents and (b)-(c) for each incident category in the low call volume setting of Section \ref{['sec:efficient_response_experiment']}.
  • ...and 7 more figures