Interpretable Machine Learning for Reservoir Water Temperatures in the U.S. Red River Basin of the South

Isabela Suaza-Sierra; Hernan A. Moreno; Luis A De la Fuente; Thomas M. Neeson

Interpretable Machine Learning for Reservoir Water Temperatures in the U.S. Red River Basin of the South

Isabela Suaza-Sierra, Hernan A. Moreno, Luis A De la Fuente, Thomas M. Neeson

TL;DR

This study tackles the interpretability gap in reservoir water temperature (RWT) modeling by integrating explainable ML with Kolmogorov–Arnold Networks (KANs) across ten reservoirs in the Red River Basin, using more than 10,000 depth-resolved profiles. Random Forest, XGBoost, and Multilayer Perceptron achieve high predictive accuracy (R^2 ≈ 0.94–0.97; RMSE ≈ 1.20–1.83°C), with SHAP identifying air temperature (notably the 7-day antecedent mean) and depth as primary drivers. The authors translate data-driven insights into compact symbolic equations via two KAN sets (simple and complex), revealing a trade-off between interpretability and accuracy, and showing that simple, physically grounded forms capture dominant RWT dynamics while complex forms offer higher predictive power. The framework demonstrates that interpretable ML complemented by symbolic modeling can yield transferable, transparent surrogates for reservoir thermal dynamics, supporting decision-making under climate variability and resource constraints.

Abstract

Accurate prediction of Reservoir Water Temperature (RWT) is vital for sustainable water management, ecosystem health, and climate resilience. Yet, prediction alone offers limited insight into the governing physical processes. To bridge this gap, we integrated explainable machine learning (ML) with symbolic modeling to uncover the drivers of RWT dynamics across ten reservoirs in the Red River Basin, USA, using over 10,000 depth-resolved temperature profiles. We first employed ensemble and neural models, including Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Multilayer Perceptron (MLP), achieving high predictive skill (best RMSE = 1.20 degree Celsius, R^2 = 0.97). Using SHAP (SHapley Additive exPlanations), we quantified the contribution of physical drivers such as air temperature, depth, wind, and lake volume, revealing consistent patterns across reservoirs. To translate these data-driven insights into compact analytical expressions, we developed Kolmogorov Arnold Networks (KANs) to symbolically approximate RWT. Ten progressively complex KAN equations were derived, improving from R^2 = 0.84 using a single predictor (7-day antecedent air temperature) to R^2 = 0.92 with ten predictors, though gains diminished beyond five, highlighting a balance between simplicity and accuracy. The resulting equations, dominated by linear and rational forms, incrementally captured nonlinear behavior while preserving interpretability. Depth consistently emerged as a secondary but critical predictor, whereas precipitation had limited effect. By coupling predictive accuracy with explanatory power, this framework demonstrates how KANs and explainable ML can transform black-box models into transparent surrogates that advance both prediction and understanding of reservoir thermal dynamics.

Interpretable Machine Learning for Reservoir Water Temperatures in the U.S. Red River Basin of the South

TL;DR

Abstract

Interpretable Machine Learning for Reservoir Water Temperatures in the U.S. Red River Basin of the South

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)