Explainable AI Integrated Feature Engineering for Wildfire Prediction
Di Fan, Ayan Biswas, James Paul Ahrens
TL;DR
This work tackles wildfire prediction by evaluating classification (safe/unsafe) and regression (inside/outside burned area) using XGBoost, Random Forest, and a CNN that fuses numeric and image data. It embeds multiple Explainable AI tools—TreeSHAP, LIME, PDP, and Grad-CAM—to illuminate feature influences and model decisions, with wind_speed and soil moisture identified as key drivers. The Yosemite wildfire dataset serves as the benchmark, revealing XGBoost as the top classifier and Random Forest as the best regressor, while a CNN offers a competitive, unified approach for dual tasks. Overall, the study demonstrates that combining accurate predictive models with robust interpretability techniques can enhance trust and operational utility for wildfire risk assessment and management.
Abstract
Wildfires present intricate challenges for prediction, necessitating the use of sophisticated machine learning techniques for effective modeling\cite{jain2020review}. In our research, we conducted a thorough assessment of various machine learning algorithms for both classification and regression tasks relevant to predicting wildfires. We found that for classifying different types or stages of wildfires, the XGBoost model outperformed others in terms of accuracy and robustness. Meanwhile, the Random Forest regression model showed superior results in predicting the extent of wildfire-affected areas, excelling in both prediction error and explained variance. Additionally, we developed a hybrid neural network model that integrates numerical data and image information for simultaneous classification and regression. To gain deeper insights into the decision-making processes of these models and identify key contributing features, we utilized eXplainable Artificial Intelligence (XAI) techniques, including TreeSHAP, LIME, Partial Dependence Plots (PDP), and Gradient-weighted Class Activation Mapping (Grad-CAM). These interpretability tools shed light on the significance and interplay of various features, highlighting the complex factors influencing wildfire predictions. Our study not only demonstrates the effectiveness of specific machine learning models in wildfire-related tasks but also underscores the critical role of model transparency and interpretability in environmental science applications.
