BCWildfire: A Long-term Multi-factor Dataset and Deep Learning Benchmark for Boreal Wildfire Risk Prediction

Zhengsen Xu; Sibo Cheng; Lanying Wang; Hongjie He; Wentao Sun; Jonathan Li; Lincoln Linlin Xu

BCWildfire: A Long-term Multi-factor Dataset and Deep Learning Benchmark for Boreal Wildfire Risk Prediction

Zhengsen Xu, Sibo Cheng, Lanying Wang, Hongjie He, Wentao Sun, Jonathan Li, Lincoln Linlin Xu

TL;DR

BCWildfire delivers a large-scale boreal wildfire risk benchmark with 25-year daily data across 240 million hectares and 38 driving factors, enabling long-term temporal modeling for ignition risk forecasting. The dataset harmonizes MODIS, ERA5-Land, OpenStreetMap, and DEM sources at 1 km daily resolution and provides a 2.4 million-sample, next-day prediction task. Benchmarking six model families across CNN, Linear, Transformer, and Mamba architectures reveals that Transformer models and spatial embeddings improve predictive performance but face ceilings due to class imbalance and the stochastic nature of ignition; SHAP analyses reveal physically meaningful drivers such as recent fire activity, soil moisture, and vegetation indices. The resource enables future research in long-horizon wildfire prediction and practical risk management by offering a unified, multimodal time-series benchmark and accompanying codebase.

Abstract

Wildfire risk prediction remains a critical yet challenging task due to the complex interactions among fuel conditions, meteorology, topography, and human activity. Despite growing interest in data-driven approaches, publicly available benchmark datasets that support long-term temporal modeling, large-scale spatial coverage, and multimodal drivers remain scarce. To address this gap, we present a 25-year, daily-resolution wildfire dataset covering 240 million hectares across British Columbia and surrounding regions. The dataset includes 38 covariates, encompassing active fire detections, weather variables, fuel conditions, terrain features, and anthropogenic factors. Using this benchmark, we evaluate a diverse set of time-series forecasting models, including CNN-based, linear-based, Transformer-based, and Mamba-based architectures. We also investigate effectiveness of position embedding and the relative importance of different fire-driving factors. The dataset and the corresponding code can be found at https://github.com/SynUW/mmFire

BCWildfire: A Long-term Multi-factor Dataset and Deep Learning Benchmark for Boreal Wildfire Risk Prediction

TL;DR

Abstract

BCWildfire: A Long-term Multi-factor Dataset and Deep Learning Benchmark for Boreal Wildfire Risk Prediction

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)