Table of Contents
Fetching ...

Interpretable Machine Learning for Reservoir Water Temperatures in the U.S. Red River Basin of the South

Isabela Suaza-Sierra, Hernan A. Moreno, Luis A De la Fuente, Thomas M. Neeson

TL;DR

This study tackles the interpretability gap in reservoir water temperature (RWT) modeling by integrating explainable ML with Kolmogorov–Arnold Networks (KANs) across ten reservoirs in the Red River Basin, using more than 10,000 depth-resolved profiles. Random Forest, XGBoost, and Multilayer Perceptron achieve high predictive accuracy (R^2 ≈ 0.94–0.97; RMSE ≈ 1.20–1.83°C), with SHAP identifying air temperature (notably the 7-day antecedent mean) and depth as primary drivers. The authors translate data-driven insights into compact symbolic equations via two KAN sets (simple and complex), revealing a trade-off between interpretability and accuracy, and showing that simple, physically grounded forms capture dominant RWT dynamics while complex forms offer higher predictive power. The framework demonstrates that interpretable ML complemented by symbolic modeling can yield transferable, transparent surrogates for reservoir thermal dynamics, supporting decision-making under climate variability and resource constraints.

Abstract

Accurate prediction of Reservoir Water Temperature (RWT) is vital for sustainable water management, ecosystem health, and climate resilience. Yet, prediction alone offers limited insight into the governing physical processes. To bridge this gap, we integrated explainable machine learning (ML) with symbolic modeling to uncover the drivers of RWT dynamics across ten reservoirs in the Red River Basin, USA, using over 10,000 depth-resolved temperature profiles. We first employed ensemble and neural models, including Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Multilayer Perceptron (MLP), achieving high predictive skill (best RMSE = 1.20 degree Celsius, R^2 = 0.97). Using SHAP (SHapley Additive exPlanations), we quantified the contribution of physical drivers such as air temperature, depth, wind, and lake volume, revealing consistent patterns across reservoirs. To translate these data-driven insights into compact analytical expressions, we developed Kolmogorov Arnold Networks (KANs) to symbolically approximate RWT. Ten progressively complex KAN equations were derived, improving from R^2 = 0.84 using a single predictor (7-day antecedent air temperature) to R^2 = 0.92 with ten predictors, though gains diminished beyond five, highlighting a balance between simplicity and accuracy. The resulting equations, dominated by linear and rational forms, incrementally captured nonlinear behavior while preserving interpretability. Depth consistently emerged as a secondary but critical predictor, whereas precipitation had limited effect. By coupling predictive accuracy with explanatory power, this framework demonstrates how KANs and explainable ML can transform black-box models into transparent surrogates that advance both prediction and understanding of reservoir thermal dynamics.

Interpretable Machine Learning for Reservoir Water Temperatures in the U.S. Red River Basin of the South

TL;DR

This study tackles the interpretability gap in reservoir water temperature (RWT) modeling by integrating explainable ML with Kolmogorov–Arnold Networks (KANs) across ten reservoirs in the Red River Basin, using more than 10,000 depth-resolved profiles. Random Forest, XGBoost, and Multilayer Perceptron achieve high predictive accuracy (R^2 ≈ 0.94–0.97; RMSE ≈ 1.20–1.83°C), with SHAP identifying air temperature (notably the 7-day antecedent mean) and depth as primary drivers. The authors translate data-driven insights into compact symbolic equations via two KAN sets (simple and complex), revealing a trade-off between interpretability and accuracy, and showing that simple, physically grounded forms capture dominant RWT dynamics while complex forms offer higher predictive power. The framework demonstrates that interpretable ML complemented by symbolic modeling can yield transferable, transparent surrogates for reservoir thermal dynamics, supporting decision-making under climate variability and resource constraints.

Abstract

Accurate prediction of Reservoir Water Temperature (RWT) is vital for sustainable water management, ecosystem health, and climate resilience. Yet, prediction alone offers limited insight into the governing physical processes. To bridge this gap, we integrated explainable machine learning (ML) with symbolic modeling to uncover the drivers of RWT dynamics across ten reservoirs in the Red River Basin, USA, using over 10,000 depth-resolved temperature profiles. We first employed ensemble and neural models, including Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Multilayer Perceptron (MLP), achieving high predictive skill (best RMSE = 1.20 degree Celsius, R^2 = 0.97). Using SHAP (SHapley Additive exPlanations), we quantified the contribution of physical drivers such as air temperature, depth, wind, and lake volume, revealing consistent patterns across reservoirs. To translate these data-driven insights into compact analytical expressions, we developed Kolmogorov Arnold Networks (KANs) to symbolically approximate RWT. Ten progressively complex KAN equations were derived, improving from R^2 = 0.84 using a single predictor (7-day antecedent air temperature) to R^2 = 0.92 with ten predictors, though gains diminished beyond five, highlighting a balance between simplicity and accuracy. The resulting equations, dominated by linear and rational forms, incrementally captured nonlinear behavior while preserving interpretability. Depth consistently emerged as a secondary but critical predictor, whereas precipitation had limited effect. By coupling predictive accuracy with explanatory power, this framework demonstrates how KANs and explainable ML can transform black-box models into transparent surrogates that advance both prediction and understanding of reservoir thermal dynamics.

Paper Structure

This paper contains 24 sections, 16 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Map of study reservoirs within the Red River Basin divide. The main Red River channel is shown in dark blue, with major tributaries depicted in light blue. Lakes and reservoirs are represented by filled circles: black-filled circles indicate major reservoirs, while colored circles represent the major reservoirs that have available RWT measurements. The inset map highlights the basin's location in the south-central United States. A compact description of all studied reservoirs (labeled from 1-10 in this figure) is provided in Table \ref{['reservoir_characteristics']}.
  • Figure 2: Intra-anual distribution of temperature measurements across reservoirs during the study period (1996-2020). Each point on the plot represents a single temperature measurement recorded on a specific day of the year and depth for a particular reservoir. The blue dashed line represents the inter-annual daily average temperature for each reservoir, while the red solid line shows the overall inter-annual average temperature across all 10 reservoirs.
  • Figure 3: Summary of methodological framework from multi-source inputs to machine learning model training, testing and evaluation, SHAP feature ranking and Kolmogorov-Arnold Networks to distill predictive equations.
  • Figure 4: Scatterplots comparing observed and predicted RWT values for the (left) RF, (center) XGBoost, and (right) MLP models on the testing dataset. The 1:1 line indicates perfect agreement, while the dashed lines represent ±10% error bounds. All models show strong alignment with the 1:1 line, indicating high predictive accuracy.
  • Figure 5: Quantile–Quantile (Q-Q) plots of observed vs. predicted values for the RF (left), XGBoost (center), and MLP (right) models on the test set. The red dashed line denotes the 1:1 reference line.
  • ...and 4 more figures