Table of Contents
Fetching ...

A curated UK rain radar data set for training and benchmarking nowcasting models

Viv Atureta, Rifki Priansyah Jasin, Stefan Siegert

TL;DR

This paper introduces a curated UK rain radar dataset designed for nowcasting benchmarking, comprising 1,000 sequences of 20 frames (40×40) at 15-minute intervals with rich metadata and open tooling for Nimrod binary data. It describes the Nimrod data archive, a stratified sampling pipeline to ensure spatially uniform coverage while enforcing a precipitation threshold, and the augmentation of sequences with terrain, wind, and storm-type information. A case study demonstrates a CNN-based next-frame predictor that substantially reduces MSE against a persistence baseline, illustrating the dataset’s utility for rapid prototyping and method comparison. The work provides a reproducible workflow and ready-to-use code to extract and evaluate nowcasting methods on standardized UK radar data, enabling broader benchmarking and adaptation to regional applications.

Abstract

This paper documents a data set of UK rain radar image sequences for use in statistical modeling and machine learning methods for nowcasting. The main dataset contains 1,000 randomly sampled sequences of length 20 steps (15-minute increments) of 2D radar intensity fields of dimension 40x40 (at 5km spatial resolution). Spatially stratified sampling ensures spatial homogeneity despite removal of clear-sky cases by threshold-based truncation. For each radar sequence, additional atmospheric and geographic features are made available, including date, location, mean elevation, mean wind direction and speed and prevailing storm type. New R functions to extract data from the binary "Nimrod" radar data format are provided. A case study is presented to train and evaluate a simple convolutional neural network for radar nowcasting, including self-contained R code.

A curated UK rain radar data set for training and benchmarking nowcasting models

TL;DR

This paper introduces a curated UK rain radar dataset designed for nowcasting benchmarking, comprising 1,000 sequences of 20 frames (40×40) at 15-minute intervals with rich metadata and open tooling for Nimrod binary data. It describes the Nimrod data archive, a stratified sampling pipeline to ensure spatially uniform coverage while enforcing a precipitation threshold, and the augmentation of sequences with terrain, wind, and storm-type information. A case study demonstrates a CNN-based next-frame predictor that substantially reduces MSE against a persistence baseline, illustrating the dataset’s utility for rapid prototyping and method comparison. The work provides a reproducible workflow and ready-to-use code to extract and evaluate nowcasting methods on standardized UK radar data, enabling broader benchmarking and adaptation to regional applications.

Abstract

This paper documents a data set of UK rain radar image sequences for use in statistical modeling and machine learning methods for nowcasting. The main dataset contains 1,000 randomly sampled sequences of length 20 steps (15-minute increments) of 2D radar intensity fields of dimension 40x40 (at 5km spatial resolution). Spatially stratified sampling ensures spatial homogeneity despite removal of clear-sky cases by threshold-based truncation. For each radar sequence, additional atmospheric and geographic features are made available, including date, location, mean elevation, mean wind direction and speed and prevailing storm type. New R functions to extract data from the binary "Nimrod" radar data format are provided. A case study is presented to train and evaluate a simple convolutional neural network for radar nowcasting, including self-contained R code.

Paper Structure

This paper contains 12 sections, 7 figures, 1 table.

Figures (7)

  • Figure 1: Fraction of available radar fields out of all possible 15-minute time stamps in each month.
  • Figure 2: Example plot of Storm "Amy" moving over the UK on 2025-10-03.
  • Figure 3: Left: Sampling locations and spatial strata. Right: Number of samples per year compared to 95% simultaneous probability interval of the Binom(1000, 1/11) distribution (Bonferroni corrected).
  • Figure 4: Sample sequences from different locations and seasons. Brighter colours indicate higher precipitation rate. Date and time of first frame and coordinates of center point are given in captions.
  • Figure 5: Total precipitation aggregates per sampled sequence, and histogram in log-linear scales.
  • ...and 2 more figures