Table of Contents
Fetching ...

AIFL: A Global Daily Streamflow Forecasting Model Using Deterministic LSTM Pre-trained on ERA5-Land and Fine-tuned on IFS

Maria Luisa Taccari, Kenza Tazi, Oisín M. Morrison, Andreas Grafberger, Juan Colonese, Corentin Carton de Wiart, Christel Prudhomme, Cinzia Mazzetti, Matthew Chantry, Florian Pappenberger

TL;DR

AIFL addresses the reanalysis-to-forecast domain shift in global streamflow forecasting by training a deterministic LSTM on ERA5-Land data and then fine-tuning on IFS forecasts, using a CARAVAN-based global basin set. The two-stage transfer learning approach yields robust skill, achieving a median $KGE'$ of $0.66$ and NSE of $0.53$ on 2021–2024 tests across 2,003 basins, while maintaining near-perfect volume balance and zero-false-alarm flood detection (precision = 1.0) for 1.5–50 year events. In head-to-head benchmarking with Google’s global model, AIFL is competitive overall and outperforms at roughly 43% of shared stations, particularly in smaller basins where its single-stage deterministic pipeline remains stable. The work establishes a practical, reproducible baseline for global flood forecasting and highlights future directions toward probabilistic forecasts and multi-source forcing to enhance recall of extremes. $KGE'$ and $NSE$ metrics demonstrate statistically meaningful predictive skill under operational forcing, underscoring the model’s potential for real-world deployment.

Abstract

Reliable global streamflow forecasting is essential for flood preparedness and water resource management, yet data-driven models often suffer from a performance gap when transitioning from historical reanalysis to operational forecast products. This paper introduces AIFL (Artificial Intelligence for Floods), a deterministic LSTM-based model designed for global daily streamflow forecasting. Trained on 18,588 basins curated from the CARAVAN dataset, AIFL utilises a novel two-stage training strategy to bridge the reanalysis-to-forecast domain shift. The model is first pre-trained on 40 years of ERA5-Land reanalysis (1980-2019) to capture robust hydrological processes, then fine-tuned on operational Integrated Forecasting System (IFS) control forecasts (2016-2019) to adapt to the specific error structures and biases of operational numerical weather prediction. To our knowledge, this is the first global model trained end-to-end within the CARAVAN ecosystem. On an independent temporal test set (2021-2024), AIFL achieves high predictive skill with a median modified Kling-Gupta Efficiency (KGE') of 0.66 and a median Nash-Sutcliffe Efficiency (NSE) of 0.53. Benchmarking results show that AIFL is highly competitive with current state-of-the-art global systems, achieving comparable accuracy while maintaining a transparent and reproducible forcing pipeline. The model demonstrates exceptional reliability in extreme-event detection, providing a streamlined and operationally robust baseline for the global hydrological community.

AIFL: A Global Daily Streamflow Forecasting Model Using Deterministic LSTM Pre-trained on ERA5-Land and Fine-tuned on IFS

TL;DR

AIFL addresses the reanalysis-to-forecast domain shift in global streamflow forecasting by training a deterministic LSTM on ERA5-Land data and then fine-tuning on IFS forecasts, using a CARAVAN-based global basin set. The two-stage transfer learning approach yields robust skill, achieving a median of and NSE of on 2021–2024 tests across 2,003 basins, while maintaining near-perfect volume balance and zero-false-alarm flood detection (precision = 1.0) for 1.5–50 year events. In head-to-head benchmarking with Google’s global model, AIFL is competitive overall and outperforms at roughly 43% of shared stations, particularly in smaller basins where its single-stage deterministic pipeline remains stable. The work establishes a practical, reproducible baseline for global flood forecasting and highlights future directions toward probabilistic forecasts and multi-source forcing to enhance recall of extremes. and metrics demonstrate statistically meaningful predictive skill under operational forcing, underscoring the model’s potential for real-world deployment.

Abstract

Reliable global streamflow forecasting is essential for flood preparedness and water resource management, yet data-driven models often suffer from a performance gap when transitioning from historical reanalysis to operational forecast products. This paper introduces AIFL (Artificial Intelligence for Floods), a deterministic LSTM-based model designed for global daily streamflow forecasting. Trained on 18,588 basins curated from the CARAVAN dataset, AIFL utilises a novel two-stage training strategy to bridge the reanalysis-to-forecast domain shift. The model is first pre-trained on 40 years of ERA5-Land reanalysis (1980-2019) to capture robust hydrological processes, then fine-tuned on operational Integrated Forecasting System (IFS) control forecasts (2016-2019) to adapt to the specific error structures and biases of operational numerical weather prediction. To our knowledge, this is the first global model trained end-to-end within the CARAVAN ecosystem. On an independent temporal test set (2021-2024), AIFL achieves high predictive skill with a median modified Kling-Gupta Efficiency (KGE') of 0.66 and a median Nash-Sutcliffe Efficiency (NSE) of 0.53. Benchmarking results show that AIFL is highly competitive with current state-of-the-art global systems, achieving comparable accuracy while maintaining a transparent and reproducible forcing pipeline. The model demonstrates exceptional reliability in extreme-event detection, providing a streamlined and operationally robust baseline for the global hydrological community.
Paper Structure (14 sections, 1 equation, 10 figures, 4 tables)

This paper contains 14 sections, 1 equation, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Schematic of the AIFL framework. The model architecture uses separate Multi-Layer Perceptron (MLP) embedding layers for static and dynamic inputs, feeding a shared LSTM core that processes a 170-day hindcast-window to generate 10-day forecasts. The training strategy transitions from ERA5-Land reanalysis pre-training to IFS forecast fine-tuning to resolve domain shifts.
  • Figure 2: Global spatial distribution of the 18,588 quality-controlled streamflow stations across the three experimental stages: pre-training, fine-tuning, and testing. The inset diagrams provide the frequency distribution of basin surface areas (on a $\log_{10}$ scale) for each subset.
  • Figure 3: Global station availability over time (1950--2023). Shaded regions indicate the splits for pre-training and fine-tuning (green), validation (blue), and testing (orange).
  • Figure 4: Global spatial distribution of the normalised Wasserstein distance ($W_1$) between ERA5-Land reanalysis and 1-day lead time (LT1) IFS daily precipitation. The metric quantifies the distributional shift between the pre-training and operational forcing data, calculated over a common 4-year period (2016--2019, $n = 1{,}461$ days) for 2,003 basins. To ensure cross-climatological comparability, $W_1$ is computed using only wet days ($>1$ mm) and normalised by the mean ERA5-Land wet-day precipitation. The inset Cumulative Distribution Function (CDF) highlights the right-skewed nature of the discrepancies: while the median shift is relatively small ($0.045$), a "large" difference—defined as the upper decile of the distribution—corresponds to $W_1 > 0.119$, with extreme cases reaching $0.638$. These high-distance regions identify where operational forecasts deviate most significantly from the training reanalysis, providing a quantitative basis for the necessity of the fine-tuning stage to mitigate forecast-induced streamflow biases.
  • Figure 5: Hydro-meteorological time series for the Hokitika River, Gorge (New Zealand; 363 km$^2$) during 2022. Top: precipitation ($P$) from ERA5-Land (blue) and IFS Control LT1 (yellow). Bottom: observed discharge ($Q$; black dashed) compared against two AIFL model configurations both forced by the same IFS Control LT1 inputs: the pre-trained model (steel blue) and the fine-tuned model (red). This direct comparison isolates the impact of the model weights, demonstrating how the fine-tuned AIFL learns to correct for the systematic wet bias in IFS precipitation peaks to align streamflow magnitudes with observations.
  • ...and 5 more figures