Table of Contents
Fetching ...

BCWildfire: A Long-term Multi-factor Dataset and Deep Learning Benchmark for Boreal Wildfire Risk Prediction

Zhengsen Xu, Sibo Cheng, Lanying Wang, Hongjie He, Wentao Sun, Jonathan Li, Lincoln Linlin Xu

TL;DR

BCWildfire delivers a large-scale boreal wildfire risk benchmark with 25-year daily data across 240 million hectares and 38 driving factors, enabling long-term temporal modeling for ignition risk forecasting. The dataset harmonizes MODIS, ERA5-Land, OpenStreetMap, and DEM sources at 1 km daily resolution and provides a 2.4 million-sample, next-day prediction task. Benchmarking six model families across CNN, Linear, Transformer, and Mamba architectures reveals that Transformer models and spatial embeddings improve predictive performance but face ceilings due to class imbalance and the stochastic nature of ignition; SHAP analyses reveal physically meaningful drivers such as recent fire activity, soil moisture, and vegetation indices. The resource enables future research in long-horizon wildfire prediction and practical risk management by offering a unified, multimodal time-series benchmark and accompanying codebase.

Abstract

Wildfire risk prediction remains a critical yet challenging task due to the complex interactions among fuel conditions, meteorology, topography, and human activity. Despite growing interest in data-driven approaches, publicly available benchmark datasets that support long-term temporal modeling, large-scale spatial coverage, and multimodal drivers remain scarce. To address this gap, we present a 25-year, daily-resolution wildfire dataset covering 240 million hectares across British Columbia and surrounding regions. The dataset includes 38 covariates, encompassing active fire detections, weather variables, fuel conditions, terrain features, and anthropogenic factors. Using this benchmark, we evaluate a diverse set of time-series forecasting models, including CNN-based, linear-based, Transformer-based, and Mamba-based architectures. We also investigate effectiveness of position embedding and the relative importance of different fire-driving factors. The dataset and the corresponding code can be found at https://github.com/SynUW/mmFire

BCWildfire: A Long-term Multi-factor Dataset and Deep Learning Benchmark for Boreal Wildfire Risk Prediction

TL;DR

BCWildfire delivers a large-scale boreal wildfire risk benchmark with 25-year daily data across 240 million hectares and 38 driving factors, enabling long-term temporal modeling for ignition risk forecasting. The dataset harmonizes MODIS, ERA5-Land, OpenStreetMap, and DEM sources at 1 km daily resolution and provides a 2.4 million-sample, next-day prediction task. Benchmarking six model families across CNN, Linear, Transformer, and Mamba architectures reveals that Transformer models and spatial embeddings improve predictive performance but face ceilings due to class imbalance and the stochastic nature of ignition; SHAP analyses reveal physically meaningful drivers such as recent fire activity, soil moisture, and vegetation indices. The resource enables future research in long-horizon wildfire prediction and practical risk management by offering a unified, multimodal time-series benchmark and accompanying codebase.

Abstract

Wildfire risk prediction remains a critical yet challenging task due to the complex interactions among fuel conditions, meteorology, topography, and human activity. Despite growing interest in data-driven approaches, publicly available benchmark datasets that support long-term temporal modeling, large-scale spatial coverage, and multimodal drivers remain scarce. To address this gap, we present a 25-year, daily-resolution wildfire dataset covering 240 million hectares across British Columbia and surrounding regions. The dataset includes 38 covariates, encompassing active fire detections, weather variables, fuel conditions, terrain features, and anthropogenic factors. Using this benchmark, we evaluate a diverse set of time-series forecasting models, including CNN-based, linear-based, Transformer-based, and Mamba-based architectures. We also investigate effectiveness of position embedding and the relative importance of different fire-driving factors. The dataset and the corresponding code can be found at https://github.com/SynUW/mmFire

Paper Structure

This paper contains 20 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Time series of key variables (2023-2024) showing wildfire prediction challenges. Figure (a)-(e): precipitation, soil moisture L1, MODIS Band 20, 2m temperature (exogenous), and burned area (endogenous) at an ignition point. Figure (f)-(j): same variables in non-ignition areas. Red: wildfire periods; Green: missing data. The figure demonstrates spatiotemporal decoupling between exogenous/endogenous variables and remote sensing data gaps.
  • Figure 2: Partial overview of the dataset, illustrating fuel (green), fire detection (red), topography (yellow), human activities (black), and meteorological factors (blue).
  • Figure 3: Comparison of key environmental drivers between burned and unburned areas prior to wildfire ignition. (a) Kernel density distributions of six selected variables within 10 days before ignition. (b) Mean temporal trends of the same variables.
  • Figure 4: Visualization of predictions from different time series forecasting models on August 15, 2024.