Table of Contents
Fetching ...

On the Adversarial Robustness of Hydrological Models

Yang Yang, Joseph Janssen, Hoshin Gupta, Ting Fong May Chui

TL;DR

This work compares physical-conceptual and deep learning-based hydrological models across 1,347 German catchments under perturbations of varying magnitudes, using the fast gradient sign method and finds that, as expected, the FGSM perturbations systematically reduce KGE and increase MSE.

Abstract

The evaluation of hydrological models is essential for both model selection and reliability assessment. However, simply comparing predictions to observations is insufficient for understanding the global landscape of model behavior. This is especially true for many deep learning models, whose structures are complex. Further, in risk-averse operational settings, water managers require models that are trustworthy and provably safe, as non-robustness can put our critical infrastructure at risk. Motivated by the need to select reliable models for operational deployment, we introduce and explore adversarial robustness analysis in hydrological modeling, evaluating whether small, targeted perturbations to meteorological forcings induce substantial changes in simulated discharge. We compare physical-conceptual and deep learning-based hydrological models across 1,347 German catchments under perturbations of varying magnitudes, using the fast gradient sign method (FGSM). We find that, as expected, the FGSM perturbations systematically reduce KGE and increase MSE. However, catastrophic failure is rare and, surprisingly, LSTMs generally demonstrate greater robustness than HBV models. Further, changes in both the predicted hydrographs and the internal model states often respond approximately linearly (at least locally) as perturbation size increases, providing a compact summary of how errors grow under such perturbations. Similar patterns are also observed for random perturbations, suggesting that small input changes usually introduce approximately proportional changes in model output. Overall, these findings support further consideration of LSTMs for operational deployment (due both to their predictive power and robustness), and motivate future work on both characterizing model responses to input changes and improving robustness through architectural modifications and training design.

On the Adversarial Robustness of Hydrological Models

TL;DR

This work compares physical-conceptual and deep learning-based hydrological models across 1,347 German catchments under perturbations of varying magnitudes, using the fast gradient sign method and finds that, as expected, the FGSM perturbations systematically reduce KGE and increase MSE.

Abstract

The evaluation of hydrological models is essential for both model selection and reliability assessment. However, simply comparing predictions to observations is insufficient for understanding the global landscape of model behavior. This is especially true for many deep learning models, whose structures are complex. Further, in risk-averse operational settings, water managers require models that are trustworthy and provably safe, as non-robustness can put our critical infrastructure at risk. Motivated by the need to select reliable models for operational deployment, we introduce and explore adversarial robustness analysis in hydrological modeling, evaluating whether small, targeted perturbations to meteorological forcings induce substantial changes in simulated discharge. We compare physical-conceptual and deep learning-based hydrological models across 1,347 German catchments under perturbations of varying magnitudes, using the fast gradient sign method (FGSM). We find that, as expected, the FGSM perturbations systematically reduce KGE and increase MSE. However, catastrophic failure is rare and, surprisingly, LSTMs generally demonstrate greater robustness than HBV models. Further, changes in both the predicted hydrographs and the internal model states often respond approximately linearly (at least locally) as perturbation size increases, providing a compact summary of how errors grow under such perturbations. Similar patterns are also observed for random perturbations, suggesting that small input changes usually introduce approximately proportional changes in model output. Overall, these findings support further consideration of LSTMs for operational deployment (due both to their predictive power and robustness), and motivate future work on both characterizing model responses to input changes and improving robustness through architectural modifications and training design.
Paper Structure (31 sections, 7 equations, 13 figures, 2 tables)

This paper contains 31 sections, 7 equations, 13 figures, 2 tables.

Figures (13)

  • Figure 1: The structure of the HBV conceptual hydrological model. Inspired by a similar graphic from shrestha2008data. The storage and fluxes represented here include SF (snow fall), RF (rainfall), ET (evapotranspiration), SP (snowpack), MW (meltwater), SM (soil moisture), SUZ (upper zone storage), SLZ (lower zone storage), and Q (streamflow).
  • Figure 2: Comparison of CAMELS-DE (black) and Caravan-DE (red) time series for precipitation (top), temperature (middle), and potential evapotranspiration (bottom) in two catchments in 2016: a representative catchment with average precipitation, snowfall, and precipitation seasonality (DE911520, left) and the driest catchment in the CAMELS-DE dataset (DEE10410, right). Mean Absolute Error MAE) and bias (Caravan - CAMELS) are also shown.
  • Figure 3: Maps of Kling–Gupta efficiency (KGE) across CAMELS‑DE gauging stations for the LSTM and HBV models before and after FGSM adversarial perturbations $\epsilon=0.2$. Panels show (a) LSTM before, (b) LSTM after, (c) HBV before, and (d) HBV after perturbations; point colors denote KGE categories as in the legend.
  • Figure 4: KGE across CAMELS‑DE gauging stations before and after an FGSM perturbation ($\epsilon=0.20$) for LSTM and HBV. (a) Connected dot plot: for each catchment, a gray line links KGE before (left) to after (right) within each model (LSTM in blue, HBV in green; jitter added for visibility). (b) Empirical cumulative distribution functions of KGE, with solid lines for before‑perturbation and dashed lines for after‑perturbation distributions, summarizing the distributional shift. A few sites have $\mathrm{KGE}<-1$ (LSTM: 1 before/1 after; HBV: 1 before/3 after) and are outside the $[-1,1]$ axis limits.
  • Figure 5: Baseline and FGSM‑perturbed meteorological forcings (Baseline and FGSM‑perturbed meteorological forcings (precipitation $P$, temperature $T$, and $PET$) and discharge (Q) for CAMELS‑DE catchment DE911520 over 2016 ($\epsilon=0.2$) for both LSTM and HBV models. Baseline and perturbed $P$ and $T$ traces largely overlap, while $Q$ diverges visibly.
  • ...and 8 more figures