Table of Contents
Fetching ...

Fine Flood Forecasts: Incorporating local data into global models through fine-tuning

Emil Ryd, Grey Nearing

TL;DR

This paper tackles the usability gap in ML-based flood forecasting by combining global pre-training with basin-level fine-tuning to retain accuracy while enabling local ownership. It pre-trains on the Caravan dataset with 6,375 basins and fine-tunes on 159 basins, achieving significant gains in NSE and KGE over pre-trained models, especially for basins that underperform initially. The approach reduces data-sharing barriers and provides an actionable workflow and open-source resources for national forecasters. A key finding is that fine-tuning gains are larger where the pre-trained model performs poorly, suggesting a pragmatic path to tailor global models to local hydrological contexts.

Abstract

Floods are the most common form of natural disaster and accurate flood forecasting is essential for early warning systems. Previous work has shown that machine learning (ML) models are a promising way to improve flood predictions when trained on large, geographically-diverse datasets. This requirement of global training can result in a loss of ownership for national forecasters who cannot easily adapt the models to improve performance in their region, preventing ML models from being operationally deployed. Furthermore, traditional hydrology research with physics-based models suggests that local data -- which in many cases is only accessible to local agencies -- is valuable for improving model performance. To address these concerns, we demonstrate a methodology of pre-training a model on a large, global dataset and then fine-tuning that model on data from individual basins. This results in performance increases, validating our hypothesis that there is extra information to be captured in local data. In particular, we show that performance increases are most significant in watersheds that underperform during global training. We provide a roadmap for national forecasters who wish to take ownership of global models using their own data, aiming to lower the barrier to operational deployment of ML-based hydrological forecast systems.

Fine Flood Forecasts: Incorporating local data into global models through fine-tuning

TL;DR

This paper tackles the usability gap in ML-based flood forecasting by combining global pre-training with basin-level fine-tuning to retain accuracy while enabling local ownership. It pre-trains on the Caravan dataset with 6,375 basins and fine-tunes on 159 basins, achieving significant gains in NSE and KGE over pre-trained models, especially for basins that underperform initially. The approach reduces data-sharing barriers and provides an actionable workflow and open-source resources for national forecasters. A key finding is that fine-tuning gains are larger where the pre-trained model performs poorly, suggesting a pragmatic path to tailor global models to local hydrological contexts.

Abstract

Floods are the most common form of natural disaster and accurate flood forecasting is essential for early warning systems. Previous work has shown that machine learning (ML) models are a promising way to improve flood predictions when trained on large, geographically-diverse datasets. This requirement of global training can result in a loss of ownership for national forecasters who cannot easily adapt the models to improve performance in their region, preventing ML models from being operationally deployed. Furthermore, traditional hydrology research with physics-based models suggests that local data -- which in many cases is only accessible to local agencies -- is valuable for improving model performance. To address these concerns, we demonstrate a methodology of pre-training a model on a large, global dataset and then fine-tuning that model on data from individual basins. This results in performance increases, validating our hypothesis that there is extra information to be captured in local data. In particular, we show that performance increases are most significant in watersheds that underperform during global training. We provide a roadmap for national forecasters who wish to take ownership of global models using their own data, aiming to lower the barrier to operational deployment of ML-based hydrological forecast systems.

Paper Structure

This paper contains 14 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Left: world map showing all 6375 in the Caravan dataset (grey), and our 159 randomly sampled basins (blue). Right: bar chart showing how well each country is represented in our sample compared to the dataset as a whole.
  • Figure 2: Correlation between fine-tuning improvements and pre-trained model skill across all basins (left) and aggregated across all countries (right). There is a negative correlation between pre-trained prediction skill and the improvement attained from fine-tuning.