Optimization-Augmented Machine Learning for Vehicle Operations in Emergency Medical Services
Maximiliane Rautenstrauß, Maximilian Schiffer
TL;DR
This work introduces a CO-augmented ML pipeline to learn online ambulance dispatching and redeployment policies for EMS systems, combining an offline full-information MILP to generate training data with an end-to-end learned parameterization feeding a combinatorial optimization layer. The method leverages a structured imitation-learning objective and a perturbed Fenchel-Young loss to train predictors that guide a weighted bipartite matching solution, achieving up to $30\%$ reductions in mean response time on San Francisco 911 data and up to $87.9\%$ runtime savings for data augmentation. Dedicated weekday/weekend policies offer limited gains over daily-trained models, while augmenting the training set with suboptimal-policy states substantially improves performance in resource-constrained settings. Compared with DAGGER-based training, the proposed a priori data augmentation delivers near-competitive results with far greater scalability and lower computational cost. Overall, the approach demonstrates practical, scalable improvements for anticipatory EMS decision-making with significant real-world impact.
Abstract
Minimizing response times to meet legal requirements and serve patients in a timely manner is crucial for Emergency Medical Service (EMS) systems. Achieving this goal necessitates optimizing operational decision-making to efficiently manage ambulances. Against this background, we study a centrally controlled EMS system for which we learn an online ambulance dispatching and redeployment policy that aims at minimizing the mean response time of ambulances within the system by dispatching an ambulance upon receiving an emergency call and redeploying it to a waiting location upon the completion of its service. We propose a novel combinatorial optimization-augmented machine learning pipeline that allows to learn efficient policies for ambulance dispatching and redeployment. In this context, we further show how to solve the underlying full-information problem to generate training data and propose an augmentation scheme that improves our pipeline's generalization performance by mitigating a possible distribution mismatch with respect to the considered state space. Compared to existing methods that rely on augmentation during training, our approach offers substantial runtime savings of up to 87.9% while yielding competitive performance. To evaluate the performance of our pipeline against current industry practices, we conduct a numerical case study on the example of San Francisco's 911 call data. Results show that the learned policies outperform the online benchmarks across various resource and demand scenarios, yielding a reduction in mean response time of up to 30%.
