Table of Contents
Fetching ...

Conformal Risk Control for Safety-Critical Wildfire Evacuation Mapping: A Comparative Study of Tabular, Spatial, and Graph-Based Models

Baljinnyam Dayan

Abstract

Every wildfire prediction model deployed today shares a dangerous property: none of these methods provides formal guarantees on how much fire spread is missed. Despite extensive work on wildfire spread prediction using deep learning, no prior study has applied distribution-free safety guarantees to this domain, leaving evacuation planners reliant on probability thresholds with no formal assurance. We address this gap by presenting, to our knowledge, the first application of conformal risk control (CRC) to wildfire spread prediction, providing finite-sample guarantees on false negative rate (FNR <= 0.05). We expose a stark failure: across three model families of increasing complexity (tabular: LightGBM, AUROC 0.854; convolutional: Tiny U-Net, AUROC 0.969; and graph-based: Hybrid ResGNN-UNet, AUROC 0.964), standard thresholds capture only 7-72% of true fire spread. CRC eliminates this failure uniformly. Our central finding is that model architecture determines evacuation efficiency, while CRC determines safety: both spatial models with CRC achieve approximately 95% fire coverage while flagging only approximately 15% of total pixels, making them 4.2x more efficient than LightGBM, while the graph model's additional complexity over a simple U-Net yields no meaningful efficiency gain. We propose a shift-aware three-way CRC framework that assigns SAFE/MONITOR/EVACUATE zones for operational triage, and characterize a fundamental limitation of prevalence-weighted bounds under extreme class imbalance (approximately 5% fire prevalence). All models, calibration code, and evaluation pipelines are released for reproducibility.

Conformal Risk Control for Safety-Critical Wildfire Evacuation Mapping: A Comparative Study of Tabular, Spatial, and Graph-Based Models

Abstract

Every wildfire prediction model deployed today shares a dangerous property: none of these methods provides formal guarantees on how much fire spread is missed. Despite extensive work on wildfire spread prediction using deep learning, no prior study has applied distribution-free safety guarantees to this domain, leaving evacuation planners reliant on probability thresholds with no formal assurance. We address this gap by presenting, to our knowledge, the first application of conformal risk control (CRC) to wildfire spread prediction, providing finite-sample guarantees on false negative rate (FNR <= 0.05). We expose a stark failure: across three model families of increasing complexity (tabular: LightGBM, AUROC 0.854; convolutional: Tiny U-Net, AUROC 0.969; and graph-based: Hybrid ResGNN-UNet, AUROC 0.964), standard thresholds capture only 7-72% of true fire spread. CRC eliminates this failure uniformly. Our central finding is that model architecture determines evacuation efficiency, while CRC determines safety: both spatial models with CRC achieve approximately 95% fire coverage while flagging only approximately 15% of total pixels, making them 4.2x more efficient than LightGBM, while the graph model's additional complexity over a simple U-Net yields no meaningful efficiency gain. We propose a shift-aware three-way CRC framework that assigns SAFE/MONITOR/EVACUATE zones for operational triage, and characterize a fundamental limitation of prevalence-weighted bounds under extreme class imbalance (approximately 5% fire prevalence). All models, calibration code, and evaluation pipelines are released for reproducibility.
Paper Structure (41 sections, 7 equations, 10 figures, 4 tables)

This paper contains 41 sections, 7 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Why standard thresholds fail. A cross-section through a U-Net fire prediction shows probability peaking at ${\sim}0.8$ and decaying smoothly. The standard threshold ($\hat{p} \ge 0.5$, gray) misses the majority of the fire region; the CRC threshold ($\hat{\lambda} = 0.002$, green) captures it entirely.
  • Figure 2: FNR as a function of decision threshold for all three models. The red dashed line marks $\alpha = 0.05$. Both spatial models maintain safe FNR over a much wider threshold range than LightGBM, explaining their 4.2$\times$ smaller evacuation zones under CRC. The near-overlapping U-Net and ResGNN-UNet curves illustrate diminishing returns from architectural complexity. Markers show each model's CRC $\hat{\lambda}$.
  • Figure 3: Same U-Net, different thresholds. Standard thresholding ($\hat{p} \ge 0.5$) catches 56% of fires; CRC ($\hat{\lambda} = 0.002$) catches 100%. Bottom: probability cross-section showing how CRC captures low-probability fire pixels. Green shading marks pixels saved by CRC.
  • Figure 4: Three-model comparison. Left: AUROC. Center: without CRC, no model reaches 95% coverage. Right: with CRC, all three meet the safety target; both spatial models achieve ${\sim}4\times$ smaller evacuation zones than LightGBM.
  • Figure 5: The safety gap across three test samples (0.5%, 2.7%, 7.8% fire). Col. 1: ground truth. Col. 2: standard threshold, with missed fire in red (100%, 73%, 34% missed). Col. 3: CRC threshold, with improved detection (green). Col. 4: three-way zones (SAFE/MONITOR/EVACUATE).
  • ...and 5 more figures