Table of Contents
Fetching ...

Radar DataTree: A FAIR and Cloud-Native Framework for Scalable Weather Radar Archives

Alfonso Ladino-Rincon, Stephen W. Nesbitt

TL;DR

Radar DataTree addresses the fragmentation of weather radar archives by introducing a dataset-level, FAIR-aligned framework that extends FM-301/CfRadial 2.1 from single scans to time-resolved collections. The approach uses xarray.DataTree to form a hierarchical, time-indexed radar archive, serialized as Zarr and persisted with Icechunk to provide ACID guarantees, while the Raw2Zarr pipeline handles end-to-end ETL via Xradar for robust cloud-native processing. Empirical validation on real archives (e.g., NEXRAD, KVNX) demonstrates substantial performance gains for core workflows such as Quasi-Vertical Profile (QVP) generation and Quantitative Precipitation Estimation (QPE), along with reproducibility supported by transactional updates. This work offers a reproducible, extensible foundation for scalable radar data stewardship, enabling AI-ready workflows and efficient multi-scan analysis in cloud environments.

Abstract

We introduce Radar DataTree, the first dataset-level framework that extends the WMO FM-301 standard from individual radar volume scans to time-resolved, analysis-ready archives. Weather radar data are among the most scientifically valuable yet structurally underutilized Earth observation datasets. Despite widespread public availability, radar archives remain fragmented, vendor-specific, and poorly aligned with FAIR (Findable, Accessible, Interoperable, Reusable) principles, hindering large-scale research, reproducibility, and cloud-native computation. Radar DataTree addresses these limitations with a scalable, open-source architecture that transforms operational radar archives into FAIR-compliant, cloud-optimized datasets. Built on the FM-301/CfRadial 2.1 standard and implemented using xarray DataTree, Radar DataTree organizes radar volume scans as hierarchical, metadata-rich structures and serializes them to Zarr for scalable analysis. Coupled with Icechunk for ACID-compliant storage and versioning, this architecture enables efficient, parallel computation across thousands of radar scans with minimal preprocessing. We demonstrate significant performance gains in case studies including Quasi-Vertical Profile (QVP) and precipitation accumulation workflows, and release all tools and datasets openly via the Raw2Zarr repository. This work contributes a reproducible and extensible foundation for radar data stewardship, high-performance geoscience, and AI-ready weather infrastructure.

Radar DataTree: A FAIR and Cloud-Native Framework for Scalable Weather Radar Archives

TL;DR

Radar DataTree addresses the fragmentation of weather radar archives by introducing a dataset-level, FAIR-aligned framework that extends FM-301/CfRadial 2.1 from single scans to time-resolved collections. The approach uses xarray.DataTree to form a hierarchical, time-indexed radar archive, serialized as Zarr and persisted with Icechunk to provide ACID guarantees, while the Raw2Zarr pipeline handles end-to-end ETL via Xradar for robust cloud-native processing. Empirical validation on real archives (e.g., NEXRAD, KVNX) demonstrates substantial performance gains for core workflows such as Quasi-Vertical Profile (QVP) generation and Quantitative Precipitation Estimation (QPE), along with reproducibility supported by transactional updates. This work offers a reproducible, extensible foundation for scalable radar data stewardship, enabling AI-ready workflows and efficient multi-scan analysis in cloud environments.

Abstract

We introduce Radar DataTree, the first dataset-level framework that extends the WMO FM-301 standard from individual radar volume scans to time-resolved, analysis-ready archives. Weather radar data are among the most scientifically valuable yet structurally underutilized Earth observation datasets. Despite widespread public availability, radar archives remain fragmented, vendor-specific, and poorly aligned with FAIR (Findable, Accessible, Interoperable, Reusable) principles, hindering large-scale research, reproducibility, and cloud-native computation. Radar DataTree addresses these limitations with a scalable, open-source architecture that transforms operational radar archives into FAIR-compliant, cloud-optimized datasets. Built on the FM-301/CfRadial 2.1 standard and implemented using xarray DataTree, Radar DataTree organizes radar volume scans as hierarchical, metadata-rich structures and serializes them to Zarr for scalable analysis. Coupled with Icechunk for ACID-compliant storage and versioning, this architecture enables efficient, parallel computation across thousands of radar scans with minimal preprocessing. We demonstrate significant performance gains in case studies including Quasi-Vertical Profile (QVP) and precipitation accumulation workflows, and release all tools and datasets openly via the Raw2Zarr repository. This work contributes a reproducible and extensible foundation for radar data stewardship, high-performance geoscience, and AI-ready weather infrastructure.

Paper Structure

This paper contains 20 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: End-to-end ETL pipeline for radar archive ingestion. Raw radar files hosted in cloud storage (e.g., NEXRAD Level II, SIGMET) are decoded using Xradar, structured into a time-aligned hierarchical dataset using xarray.DataTree, and serialized into transactional Zarr stores managed by Icechunk. The entire process is implemented in the open-source Raw2Zarr package.
  • Figure 2: Interactive exploration of the KVNX Radar DataTree using xarray.DataTree and the arraylake Python client. The full May 2011 archive ( 765 GB) is loaded as a single navigable object. Each node represents a Volume Coverage Pattern (VCP) and can be accessed using a simple path syntax. Radar reflectivity variables (e.g., DBZH) are self-described, chunked, and cloud-optimized for scalable analysis.
  • Figure 3: Interactive computation of two standard radar science products from the KVNX Radar DataTree using xarray and Dask. (Left) Quasi-Vertical Profiles (QVPs) of four polarimetric variables for VCP-12 on May 20, 2011, following the method of ryzhkov2016qvp. Total compute time: 3.36 s. (Right) Quantitative Precipitation Estimation (QPE) by time-integrating radar reflectivity (DBZH) from VCP-212 sweeps over 4.7 days using a reflectivity–rainrate relation marshall1948raindrops. Compute time: 4.33 s on a 10-worker Dask cluster.