Table of Contents
Fetching ...

XGBoost meets INLA: a two-stage spatio-temporal forecasting of wildfires in Portugal

Chenglei Hu, Regina Baltazar Bispo, Håvard Rue, Carlos C. DaCamara, Ben Swallow, Daniela Castro-Camilo

Abstract

Wildfires pose a major threat to Portugal, with over 115,000 hectares burned annually on average during 1980-2024, and the country has faced devastating mega-fires such as those in 2017. Accurate forecasts of wildfire occurrence and burned area are therefore essential for firefighting resource allocation and emergency preparedness. In this study, we propose a novel two-stage ensemble that extends the widely used latent Gaussian modelling framework with integrated nested Laplace approximation (INLA) for spatio-temporal wildfire forecasting. Stage 1 applies a gradient boosting model (XGBoost) to environmental covariates and historical fire records to produce one-month-ahead point forecasts of fire counts and burned area. Stage 2 uses these predictions as external covariates in a latent Gaussian model with additional spatiotemporal random effects to generate probabilistic forecasts of monthly total fire counts and burned area at the council level. To capture both moderate and extreme events, we implement the extended generalised Pareto (eGP) likelihood (a sub-asymptotic distribution) within INLA, develop Penalised Complexity (PC) priors for its parameters, and compare the eGP likelihood with common alternatives (e.g., Gamma and Weibull). Our framework tackles the unavailability of future environmental covariates at prediction time and performs strongly for one-month-ahead forecasts.

XGBoost meets INLA: a two-stage spatio-temporal forecasting of wildfires in Portugal

Abstract

Wildfires pose a major threat to Portugal, with over 115,000 hectares burned annually on average during 1980-2024, and the country has faced devastating mega-fires such as those in 2017. Accurate forecasts of wildfire occurrence and burned area are therefore essential for firefighting resource allocation and emergency preparedness. In this study, we propose a novel two-stage ensemble that extends the widely used latent Gaussian modelling framework with integrated nested Laplace approximation (INLA) for spatio-temporal wildfire forecasting. Stage 1 applies a gradient boosting model (XGBoost) to environmental covariates and historical fire records to produce one-month-ahead point forecasts of fire counts and burned area. Stage 2 uses these predictions as external covariates in a latent Gaussian model with additional spatiotemporal random effects to generate probabilistic forecasts of monthly total fire counts and burned area at the council level. To capture both moderate and extreme events, we implement the extended generalised Pareto (eGP) likelihood (a sub-asymptotic distribution) within INLA, develop Penalised Complexity (PC) priors for its parameters, and compare the eGP likelihood with common alternatives (e.g., Gamma and Weibull). Our framework tackles the unavailability of future environmental covariates at prediction time and performs strongly for one-month-ahead forecasts.

Paper Structure

This paper contains 38 sections, 56 equations, 14 figures, 4 tables.

Figures (14)

  • Figure 1: Histograms of fire count and burnt area at the council-month level, highlighting the prevalence of zeros. Quantities are rescaled for visualisation purposes using a square-root transformation for fire counts and a logarithmic transformation for burned area.
  • Figure 2: Top: Average council-level fire count and burnt area. Bottom: Monthly average fire count and burnt area across all councils. Spatial and seasonal variation is evident. Quantities are rescaled for visualisation purposes using a square-root transformation for fire counts and a logarithmic transformation for burned area.
  • Figure 3: Diagram of the proposed two-stage wildfire forecasting framework combining XGBoost and INLA.
  • Figure 4: Comparison of naive and window-based modelling configurations. Arrows indicate the direction of information flow from predictors to targets.
  • Figure 5: Time-series–aware cross-validation is implemented using yearly blocks for both hyperparameter tuning and the generation of one-month-ahead forecasts. The first three years are excluded from the training set because feature construction based on a 36-month rolling window results in missing values during this period.
  • ...and 9 more figures