Machine Learning-Ready Data Sets for the Analysis and Nowcasting of Atmospheric Radiation at Aviation Altitudes
Viacheslav M Sadykov, Zachary M Watkins, Dustin Kempton, William Jones, Sanjib K C, Griffin T Goodwin, Xiaochun He, W Kent Tobiska, Irina Kitiashvili, Christopher Mertens, Shubha Ranjan, D Glenn Deardorff, Ryan Spaulding
TL;DR
This work addresses the need to forecast atmospheric radiation at aviation altitudes, a safety-critical problem, by creating open-access ML-ready datasets that fuse ARMAS measurements with a broad suite of Geospace drivers. It introduces a partitioned data design based on Gaussian Mixture Model clustering to ensure representative coverage while avoiding temporal leakage, yielding three dataset variants (static and two dynamic time histories). A use-case with a Random Forest nowcasts ARMAS dose rates at a RMSE of $3.80\,\mu\mathrm{Sv\;h^{-1}}$, slightly better than NAIRAS-v3 at $4.07\,\mu\mathrm{Sv\;h^{-1}}$, illustrating the viability of ML approaches for radiation forecasting in aviation. These ML-ready datasets provide a benchmark and a flexible foundation for further ML-driven nowcasting and forecasting, potentially accelerating improvements beyond physics-based models.
Abstract
Nowcasting and forecasting of the radiation environment in the Earth's lower atmosphere are critical for the safety of aircraft and spacecraft crews and passengers. Currently, this problem is addressed by employing statistical and physics-based models that take into account particle transport and precipitation. However, given the increased number of radiation measurements available to the community, it is possible to start developing data-driven approaches. We prepared Machine Learning-ready (ML-ready) datasets to nowcast the effective dose rates at aviation altitudes. The presented datasets contain 92,476 individual measurements from 589 flights obtained by the Automated Radiation Measurements for Aerospace Safety (ARMAS) experiment from 2013 to 2023. The ARMAS measurements are augmented with the properties of the Geospace environment, such as solar soft X-ray and proton fluxes, solar wind properties, secondary cosmic ray neutrons, space weather indexes, and global solar activity indicators (such as daily sunspot number). ARMAS data are separated into three partitions, ensuring that (1) the data points from a single flight remain within the same partition, and (2) each partition samples the flight locations and Geospace environment conditions equally. Several versions of the datasets allow predictions based on point-in-time measurements and use up to 24 hours of Geospace parameter history. The test of the use case demonstrates a possibility of nowcasting ARMAS measurements with accuracies slightly better than the considered physics-based models. The publicly available ML-ready datasets could serve as the first step in data preparation for ML-driven nowcasting and forecasting of the radiation environment.
