AIFL: A Global Daily Streamflow Forecasting Model Using Deterministic LSTM Pre-trained on ERA5-Land and Fine-tuned on IFS

Maria Luisa Taccari; Kenza Tazi; Oisín M. Morrison; Andreas Grafberger; Juan Colonese; Corentin Carton de Wiart; Christel Prudhomme; Cinzia Mazzetti; Matthew Chantry; Florian Pappenberger

AIFL: A Global Daily Streamflow Forecasting Model Using Deterministic LSTM Pre-trained on ERA5-Land and Fine-tuned on IFS

Maria Luisa Taccari, Kenza Tazi, Oisín M. Morrison, Andreas Grafberger, Juan Colonese, Corentin Carton de Wiart, Christel Prudhomme, Cinzia Mazzetti, Matthew Chantry, Florian Pappenberger

TL;DR

AIFL addresses the reanalysis-to-forecast domain shift in global streamflow forecasting by training a deterministic LSTM on ERA5-Land data and then fine-tuning on IFS forecasts, using a CARAVAN-based global basin set. The two-stage transfer learning approach yields robust skill, achieving a median $KGE'$ of $0.66$ and NSE of $0.53$ on 2021–2024 tests across 2,003 basins, while maintaining near-perfect volume balance and zero-false-alarm flood detection (precision = 1.0) for 1.5–50 year events. In head-to-head benchmarking with Google’s global model, AIFL is competitive overall and outperforms at roughly 43% of shared stations, particularly in smaller basins where its single-stage deterministic pipeline remains stable. The work establishes a practical, reproducible baseline for global flood forecasting and highlights future directions toward probabilistic forecasts and multi-source forcing to enhance recall of extremes. $KGE'$ and $NSE$ metrics demonstrate statistically meaningful predictive skill under operational forcing, underscoring the model’s potential for real-world deployment.

Abstract

Reliable global streamflow forecasting is essential for flood preparedness and water resource management, yet data-driven models often suffer from a performance gap when transitioning from historical reanalysis to operational forecast products. This paper introduces AIFL (Artificial Intelligence for Floods), a deterministic LSTM-based model designed for global daily streamflow forecasting. Trained on 18,588 basins curated from the CARAVAN dataset, AIFL utilises a novel two-stage training strategy to bridge the reanalysis-to-forecast domain shift. The model is first pre-trained on 40 years of ERA5-Land reanalysis (1980-2019) to capture robust hydrological processes, then fine-tuned on operational Integrated Forecasting System (IFS) control forecasts (2016-2019) to adapt to the specific error structures and biases of operational numerical weather prediction. To our knowledge, this is the first global model trained end-to-end within the CARAVAN ecosystem. On an independent temporal test set (2021-2024), AIFL achieves high predictive skill with a median modified Kling-Gupta Efficiency (KGE') of 0.66 and a median Nash-Sutcliffe Efficiency (NSE) of 0.53. Benchmarking results show that AIFL is highly competitive with current state-of-the-art global systems, achieving comparable accuracy while maintaining a transparent and reproducible forcing pipeline. The model demonstrates exceptional reliability in extreme-event detection, providing a streamlined and operationally robust baseline for the global hydrological community.

AIFL: A Global Daily Streamflow Forecasting Model Using Deterministic LSTM Pre-trained on ERA5-Land and Fine-tuned on IFS

TL;DR

and NSE of

on 2021–2024 tests across 2,003 basins, while maintaining near-perfect volume balance and zero-false-alarm flood detection (precision = 1.0) for 1.5–50 year events. In head-to-head benchmarking with Google’s global model, AIFL is competitive overall and outperforms at roughly 43% of shared stations, particularly in smaller basins where its single-stage deterministic pipeline remains stable. The work establishes a practical, reproducible baseline for global flood forecasting and highlights future directions toward probabilistic forecasts and multi-source forcing to enhance recall of extremes.

and

metrics demonstrate statistically meaningful predictive skill under operational forcing, underscoring the model’s potential for real-world deployment.

Abstract

Paper Structure (14 sections, 1 equation, 10 figures, 4 tables)

This paper contains 14 sections, 1 equation, 10 figures, 4 tables.

Introduction
Data Curation & Experimental Design
Datasets and Target Variable
Deduplication and Quality Control
Model Inputs and Consistency
Data Availability for Temporal Evaluation
Methodology
Model Architecture
Training Strategy
Results and Evaluation
Temporal generalisation and Global Performance
Flood Event Performance
Benchmarking: AIFL vs. Google Global Model
Conclusion and Future Directions

Figures (10)

Figure 1: Schematic of the AIFL framework. The model architecture uses separate Multi-Layer Perceptron (MLP) embedding layers for static and dynamic inputs, feeding a shared LSTM core that processes a 170-day hindcast-window to generate 10-day forecasts. The training strategy transitions from ERA5-Land reanalysis pre-training to IFS forecast fine-tuning to resolve domain shifts.
Figure 2: Global spatial distribution of the 18,588 quality-controlled streamflow stations across the three experimental stages: pre-training, fine-tuning, and testing. The inset diagrams provide the frequency distribution of basin surface areas (on a $\log_{10}$ scale) for each subset.
Figure 3: Global station availability over time (1950--2023). Shaded regions indicate the splits for pre-training and fine-tuning (green), validation (blue), and testing (orange).
Figure 4: Global spatial distribution of the normalised Wasserstein distance ($W_1$) between ERA5-Land reanalysis and 1-day lead time (LT1) IFS daily precipitation. The metric quantifies the distributional shift between the pre-training and operational forcing data, calculated over a common 4-year period (2016--2019, $n = 1{,}461$ days) for 2,003 basins. To ensure cross-climatological comparability, $W_1$ is computed using only wet days ($>1$ mm) and normalised by the mean ERA5-Land wet-day precipitation. The inset Cumulative Distribution Function (CDF) highlights the right-skewed nature of the discrepancies: while the median shift is relatively small ($0.045$), a "large" difference—defined as the upper decile of the distribution—corresponds to $W_1 > 0.119$, with extreme cases reaching $0.638$. These high-distance regions identify where operational forecasts deviate most significantly from the training reanalysis, providing a quantitative basis for the necessity of the fine-tuning stage to mitigate forecast-induced streamflow biases.
Figure 5: Hydro-meteorological time series for the Hokitika River, Gorge (New Zealand; 363 km$^2$) during 2022. Top: precipitation ($P$) from ERA5-Land (blue) and IFS Control LT1 (yellow). Bottom: observed discharge ($Q$; black dashed) compared against two AIFL model configurations both forced by the same IFS Control LT1 inputs: the pre-trained model (steel blue) and the fine-tuned model (red). This direct comparison isolates the impact of the model weights, demonstrating how the fine-tuned AIFL learns to correct for the systematic wet bias in IFS precipitation peaks to align streamflow magnitudes with observations.
...and 5 more figures

AIFL: A Global Daily Streamflow Forecasting Model Using Deterministic LSTM Pre-trained on ERA5-Land and Fine-tuned on IFS

TL;DR

Abstract

AIFL: A Global Daily Streamflow Forecasting Model Using Deterministic LSTM Pre-trained on ERA5-Land and Fine-tuned on IFS

Authors

TL;DR

Abstract

Table of Contents

Figures (10)