Table of Contents
Fetching ...

Toward Routing River Water in Land Surface Models with Recurrent Neural Networks

Mauricio Lima, Katherine Deck, Oliver R. A. Dunbar, Tapio Schneider

TL;DR

This study demonstrates that a runoff-driven LSTM can learn river routing within a global land surface model, achieving generalization across time and basins and outperforming a physics-based benchmark in many settings. By constructing a globally consistent dataset from HydroSHEDS, HydroATLAS, and ERA5-Land, the authors train an LSTM with inputs comprising daily basin runoff and static geographic attributes, and evaluate using NSE and KGE metrics. Key findings show improved time generalization with globally diverse data and reasonable basin generalization, along with notable performance gains over LISFLOOD in both time- and basin-split tests, though challenges remain in arid and data-poor regions. The work highlights practical steps toward integrating ML-based river routing into LSMs, discusses mass-balance considerations, and outlines future directions for inter-basin routing and mass-conserving architectures to enable global-scale applications.

Abstract

Machine learning is playing an increasing role in hydrology, supplementing or replacing physics-based models. One notable example is the use of recurrent neural networks (RNNs) for forecasting streamflow given observed precipitation and geographic characteristics. Training of such a model over the continental United States (CONUS) has demonstrated that a single set of model parameters can be used across independent catchments, and that RNNs can outperform physics-based models. In this work, we take a next step and study the performance of RNNs for river routing in land surface models (LSMs). Instead of observed precipitation, the LSM-RNN uses instantaneous runoff calculated from physics-based models as an input. We train the model with data from river basins spanning the globe and test it using historical streamflow measurements. The model demonstrates skill at generalization across basins (predicting streamflow in catchments not used in training) and across time (predicting streamflow during years not used in training). We compare the predictions from the LSM-RNN to an existing physics-based model calibrated with a similar dataset and find that the LSM-RNN outperforms the physics-based model: a gain in median NSE from 0.56 to 0.64 (time-split experiment) and from 0.30 to 0.34 (basin-split experiment). Our results show that RNNs are effective for global streamflow prediction from runoff inputs and motivate the development of complete routing models that can capture nested sub-basis connections.

Toward Routing River Water in Land Surface Models with Recurrent Neural Networks

TL;DR

This study demonstrates that a runoff-driven LSTM can learn river routing within a global land surface model, achieving generalization across time and basins and outperforming a physics-based benchmark in many settings. By constructing a globally consistent dataset from HydroSHEDS, HydroATLAS, and ERA5-Land, the authors train an LSTM with inputs comprising daily basin runoff and static geographic attributes, and evaluate using NSE and KGE metrics. Key findings show improved time generalization with globally diverse data and reasonable basin generalization, along with notable performance gains over LISFLOOD in both time- and basin-split tests, though challenges remain in arid and data-poor regions. The work highlights practical steps toward integrating ML-based river routing into LSMs, discusses mass-balance considerations, and outlines future directions for inter-basin routing and mass-conserving architectures to enable global-scale applications.

Abstract

Machine learning is playing an increasing role in hydrology, supplementing or replacing physics-based models. One notable example is the use of recurrent neural networks (RNNs) for forecasting streamflow given observed precipitation and geographic characteristics. Training of such a model over the continental United States (CONUS) has demonstrated that a single set of model parameters can be used across independent catchments, and that RNNs can outperform physics-based models. In this work, we take a next step and study the performance of RNNs for river routing in land surface models (LSMs). Instead of observed precipitation, the LSM-RNN uses instantaneous runoff calculated from physics-based models as an input. We train the model with data from river basins spanning the globe and test it using historical streamflow measurements. The model demonstrates skill at generalization across basins (predicting streamflow in catchments not used in training) and across time (predicting streamflow during years not used in training). We compare the predictions from the LSM-RNN to an existing physics-based model calibrated with a similar dataset and find that the LSM-RNN outperforms the physics-based model: a gain in median NSE from 0.56 to 0.64 (time-split experiment) and from 0.30 to 0.34 (basin-split experiment). Our results show that RNNs are effective for global streamflow prediction from runoff inputs and motivate the development of complete routing models that can capture nested sub-basis connections.
Paper Structure (25 sections, 6 equations, 11 figures, 5 tables)

This paper contains 25 sections, 6 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: (a) Illustration of the Monte-Carlo algorithm used for transforming gridded ERA5-Land data into basin-specific data for HydroSHEDS basin 2070017000 (shaded pink, located at the east coast of Spain). Grid cells that are completely inside the basin have a weight of 1 because their entire area lies within the basin. Grid cells outside the basin have weights less than 1, representing the fraction of their area within the basin. Other basins are shaded grey, and their boundaries are outlined in black. The white area is the sea (in this case, Mar des Baleares). (b) GRDC gauges and the river network structure within the same basin. In the figure, the chosen gauge (white circle) is the one used in the calibration of our model for this specific basin, as it better represents the entire drainage area of the basin. The other gauge has a smaller catchment area, so it represents a smaller fraction of the behavior of the basin.
  • Figure 2: Surface, sub-surface runoff, and streamflow during one year for basin 2070017000 of HydroSHEDS, located at the east coast of Spain.
  • Figure 3: Diagram of dependencies in the recurrent neural network. The input vector $x_t = (R_t^s, R_t^{ss}, ...; A, ...)$ is a concatenation of dynamic inputs, such as instantaneous surface runoff $R^s(t)$, sub-surface runoff $R^{ss}(t)$, and other dynamic inputs which vary with time $t$, and static attributes such as the catchment area $A$, and others. The variable $h$ denotes the hidden state. (Figure modeled after and inspired by those in goodfellow16.)
  • Figure 4: Cumulative NSE density function for the LSTM models trained on different datasets. The models in blue and red were trained using runoff as input; the model in purple was trained using precipitation as input. Solid lines indicate time-split datasets; dashed lines indicate basin-split datasets. The experiments are described in detail in Section \ref{['sec:temp_gen']} and Section \ref{['sec:basin_gen']}.
  • Figure 5: Cumulative density functions for (a) NSE and (b) KGE for the LSTM and GloFAS. The domain is truncated to $[0,1]$ for NSE and to $[1-\sqrt{2}, 1]$ for KGE, with the lower bounds corresponding to the mean-flow prediction reference value. The legend in (a) equally applies in (b). The "*" is there to make it explicit that GloFAS experiments are not exactly a basin-split nor a time-split, as we do not know the exact set of dates used to calibrate the model. As detailed in Section \ref{['subsec:physical_benchmarks']}, this can potentially simplify the tasks performed by LISFLOOD.
  • ...and 6 more figures