A curated UK rain radar data set for training and benchmarking nowcasting models
Viv Atureta, Rifki Priansyah Jasin, Stefan Siegert
TL;DR
This paper introduces a curated UK rain radar dataset designed for nowcasting benchmarking, comprising 1,000 sequences of 20 frames (40×40) at 15-minute intervals with rich metadata and open tooling for Nimrod binary data. It describes the Nimrod data archive, a stratified sampling pipeline to ensure spatially uniform coverage while enforcing a precipitation threshold, and the augmentation of sequences with terrain, wind, and storm-type information. A case study demonstrates a CNN-based next-frame predictor that substantially reduces MSE against a persistence baseline, illustrating the dataset’s utility for rapid prototyping and method comparison. The work provides a reproducible workflow and ready-to-use code to extract and evaluate nowcasting methods on standardized UK radar data, enabling broader benchmarking and adaptation to regional applications.
Abstract
This paper documents a data set of UK rain radar image sequences for use in statistical modeling and machine learning methods for nowcasting. The main dataset contains 1,000 randomly sampled sequences of length 20 steps (15-minute increments) of 2D radar intensity fields of dimension 40x40 (at 5km spatial resolution). Spatially stratified sampling ensures spatial homogeneity despite removal of clear-sky cases by threshold-based truncation. For each radar sequence, additional atmospheric and geographic features are made available, including date, location, mean elevation, mean wind direction and speed and prevailing storm type. New R functions to extract data from the binary "Nimrod" radar data format are provided. A case study is presented to train and evaluate a simple convolutional neural network for radar nowcasting, including self-contained R code.
