Radar DataTree: A FAIR and Cloud-Native Framework for Scalable Weather Radar Archives
Alfonso Ladino-Rincon, Stephen W. Nesbitt
TL;DR
Radar DataTree addresses the fragmentation of weather radar archives by introducing a dataset-level, FAIR-aligned framework that extends FM-301/CfRadial 2.1 from single scans to time-resolved collections. The approach uses xarray.DataTree to form a hierarchical, time-indexed radar archive, serialized as Zarr and persisted with Icechunk to provide ACID guarantees, while the Raw2Zarr pipeline handles end-to-end ETL via Xradar for robust cloud-native processing. Empirical validation on real archives (e.g., NEXRAD, KVNX) demonstrates substantial performance gains for core workflows such as Quasi-Vertical Profile (QVP) generation and Quantitative Precipitation Estimation (QPE), along with reproducibility supported by transactional updates. This work offers a reproducible, extensible foundation for scalable radar data stewardship, enabling AI-ready workflows and efficient multi-scan analysis in cloud environments.
Abstract
We introduce Radar DataTree, the first dataset-level framework that extends the WMO FM-301 standard from individual radar volume scans to time-resolved, analysis-ready archives. Weather radar data are among the most scientifically valuable yet structurally underutilized Earth observation datasets. Despite widespread public availability, radar archives remain fragmented, vendor-specific, and poorly aligned with FAIR (Findable, Accessible, Interoperable, Reusable) principles, hindering large-scale research, reproducibility, and cloud-native computation. Radar DataTree addresses these limitations with a scalable, open-source architecture that transforms operational radar archives into FAIR-compliant, cloud-optimized datasets. Built on the FM-301/CfRadial 2.1 standard and implemented using xarray DataTree, Radar DataTree organizes radar volume scans as hierarchical, metadata-rich structures and serializes them to Zarr for scalable analysis. Coupled with Icechunk for ACID-compliant storage and versioning, this architecture enables efficient, parallel computation across thousands of radar scans with minimal preprocessing. We demonstrate significant performance gains in case studies including Quasi-Vertical Profile (QVP) and precipitation accumulation workflows, and release all tools and datasets openly via the Raw2Zarr repository. This work contributes a reproducible and extensible foundation for radar data stewardship, high-performance geoscience, and AI-ready weather infrastructure.
