Table of Contents
Fetching ...

Decision-aware training of spatiotemporal forecasting models to select a top K subset of sites for intervention

Kyle Heuton, F. Samuel Muench, Shikhar Shrestha, Thomas J. Stopka, Michael C. Hughes

TL;DR

This paper tackles allocating scarce intervention resources across many spatial sites using spatiotemporal forecasts by introducing a decision-centric metric, the fraction of best possible reach (BPR). It develops a ratio-estimator ranking to produce top-$K$ site selections and a training framework called decision-aware maximum likelihood (DAML) that balances predictive likelihood with a BPR constraint, using perturbed optimizers to enable gradient-based learning through discrete top-$K$ decisions. The authors demonstrate that standard maximum likelihood can be suboptimal for decision quality, while DAML and direct BPR optimization improve top-$K$ decisions with varying effects on forecast likelihood, across synthetic data and real-world applications in opioid overdose forecasting and wildlife monitoring. Collectively, the work provides practical methods and theoretical justification for ranking and training spatiotemporal models to optimize top-$K$ interventions with significant implications for public health and conservation planning.

Abstract

Optimal allocation of scarce resources is a common problem for decision makers faced with choosing a limited number of locations for intervention. Spatiotemporal prediction models could make such decisions data-driven. A recent performance metric called fraction of best possible reach (BPR) measures the impact of using a model's recommended size K subset of sites compared to the best possible top-K in hindsight. We tackle two open problems related to BPR. First, we explore how to rank all sites numerically given a probabilistic model that predicts event counts jointly across sites. Ranking via the per-site mean is suboptimal for BPR. Instead, we offer a better ranking for BPR backed by decision theory. Second, we explore how to train a probabilistic model's parameters to maximize BPR. Discrete selection of K sites implies all-zero parameter gradients which prevent standard gradient training. We overcome this barrier via advances in perturbed optimizers. We further suggest a training objective that combines likelihood with a decision-aware BPR constraint to deliver high-quality top-K rankings as well as good forecasts for all sites. We demonstrate our approach on two where-to-intervene applications: mitigating opioid-related fatal overdoses for public health and monitoring endangered wildlife.

Decision-aware training of spatiotemporal forecasting models to select a top K subset of sites for intervention

TL;DR

This paper tackles allocating scarce intervention resources across many spatial sites using spatiotemporal forecasts by introducing a decision-centric metric, the fraction of best possible reach (BPR). It develops a ratio-estimator ranking to produce top- site selections and a training framework called decision-aware maximum likelihood (DAML) that balances predictive likelihood with a BPR constraint, using perturbed optimizers to enable gradient-based learning through discrete top- decisions. The authors demonstrate that standard maximum likelihood can be suboptimal for decision quality, while DAML and direct BPR optimization improve top- decisions with varying effects on forecast likelihood, across synthetic data and real-world applications in opioid overdose forecasting and wildlife monitoring. Collectively, the work provides practical methods and theoretical justification for ranking and training spatiotemporal models to optimize top- interventions with significant implications for public health and conservation planning.

Abstract

Optimal allocation of scarce resources is a common problem for decision makers faced with choosing a limited number of locations for intervention. Spatiotemporal prediction models could make such decisions data-driven. A recent performance metric called fraction of best possible reach (BPR) measures the impact of using a model's recommended size K subset of sites compared to the best possible top-K in hindsight. We tackle two open problems related to BPR. First, we explore how to rank all sites numerically given a probabilistic model that predicts event counts jointly across sites. Ranking via the per-site mean is suboptimal for BPR. Instead, we offer a better ranking for BPR backed by decision theory. Second, we explore how to train a probabilistic model's parameters to maximize BPR. Discrete selection of K sites implies all-zero parameter gradients which prevent standard gradient training. We overcome this barrier via advances in perturbed optimizers. We further suggest a training objective that combines likelihood with a decision-aware BPR constraint to deliver high-quality top-K rankings as well as good forecasts for all sites. We demonstrate our approach on two where-to-intervene applications: mitigating opioid-related fatal overdoses for public health and monitoring endangered wildlife.

Paper Structure

This paper contains 28 sections, 20 equations, 3 figures, 5 tables, 1 algorithm.

Figures (3)

  • Figure 1: Visual overview of our approach and contributions to the how to rank and how to train open problems.
  • Figure 2: Synthetic 1D data: learned models and Pareto frontier.Left Row 1: histograms of $y_s$ values by site (circled numbers). Sites 3-7 should be the top K=5 under the true model. Left Rows 2-4: Learned Gaussian components, with site-specific weights $\bm{\pi}_s$ marked as horizontal position between pure green and blue. Text provides ranking $r$ with how often that site is in top $K=5$ over 200 trials. Right: Likelihood vs. BPR tradeoff frontier for final models delivered by different training objectives.
  • Figure 3: Pareto frontier of best possible reach (BPR, x-axis) and log likelihood (LL, y-axis) for real-world tasks. Higher is better on both axes. Each panel how the final models estimated by different training methods score on the test set of a forecasting task defined in Sec. \ref{['sec:results']}. To capture the stochasticity of BPR due to our sampling-based ranking estimator, for each model we show an estimated density for BPR over 1000 trials. Uncertainty in this plot only corresponds to uncertainty in BPR, log likelihood is a point estimate. In all three tasks, our proposed decision-aware ML (DAML) delivers better top-K decisions as measured by BPR than ML estimation. DAML also delivers likelihood comparable to ML methods and much better than directly optimizing BPR. In the Cranes dataset, the DAML objective surprisingly offers better BPR than the BPR-only objective, although the magnitude of this difference is small and perhaps due to the small-scale and sparsity of this dataset.