Table of Contents
Fetching ...

Machine Learning-Ready Data Sets for the Analysis and Nowcasting of Atmospheric Radiation at Aviation Altitudes

Viacheslav M Sadykov, Zachary M Watkins, Dustin Kempton, William Jones, Sanjib K C, Griffin T Goodwin, Xiaochun He, W Kent Tobiska, Irina Kitiashvili, Christopher Mertens, Shubha Ranjan, D Glenn Deardorff, Ryan Spaulding

TL;DR

This work addresses the need to forecast atmospheric radiation at aviation altitudes, a safety-critical problem, by creating open-access ML-ready datasets that fuse ARMAS measurements with a broad suite of Geospace drivers. It introduces a partitioned data design based on Gaussian Mixture Model clustering to ensure representative coverage while avoiding temporal leakage, yielding three dataset variants (static and two dynamic time histories). A use-case with a Random Forest nowcasts ARMAS dose rates at a RMSE of $3.80\,\mu\mathrm{Sv\;h^{-1}}$, slightly better than NAIRAS-v3 at $4.07\,\mu\mathrm{Sv\;h^{-1}}$, illustrating the viability of ML approaches for radiation forecasting in aviation. These ML-ready datasets provide a benchmark and a flexible foundation for further ML-driven nowcasting and forecasting, potentially accelerating improvements beyond physics-based models.

Abstract

Nowcasting and forecasting of the radiation environment in the Earth's lower atmosphere are critical for the safety of aircraft and spacecraft crews and passengers. Currently, this problem is addressed by employing statistical and physics-based models that take into account particle transport and precipitation. However, given the increased number of radiation measurements available to the community, it is possible to start developing data-driven approaches. We prepared Machine Learning-ready (ML-ready) datasets to nowcast the effective dose rates at aviation altitudes. The presented datasets contain 92,476 individual measurements from 589 flights obtained by the Automated Radiation Measurements for Aerospace Safety (ARMAS) experiment from 2013 to 2023. The ARMAS measurements are augmented with the properties of the Geospace environment, such as solar soft X-ray and proton fluxes, solar wind properties, secondary cosmic ray neutrons, space weather indexes, and global solar activity indicators (such as daily sunspot number). ARMAS data are separated into three partitions, ensuring that (1) the data points from a single flight remain within the same partition, and (2) each partition samples the flight locations and Geospace environment conditions equally. Several versions of the datasets allow predictions based on point-in-time measurements and use up to 24 hours of Geospace parameter history. The test of the use case demonstrates a possibility of nowcasting ARMAS measurements with accuracies slightly better than the considered physics-based models. The publicly available ML-ready datasets could serve as the first step in data preparation for ML-driven nowcasting and forecasting of the radiation environment.

Machine Learning-Ready Data Sets for the Analysis and Nowcasting of Atmospheric Radiation at Aviation Altitudes

TL;DR

This work addresses the need to forecast atmospheric radiation at aviation altitudes, a safety-critical problem, by creating open-access ML-ready datasets that fuse ARMAS measurements with a broad suite of Geospace drivers. It introduces a partitioned data design based on Gaussian Mixture Model clustering to ensure representative coverage while avoiding temporal leakage, yielding three dataset variants (static and two dynamic time histories). A use-case with a Random Forest nowcasts ARMAS dose rates at a RMSE of , slightly better than NAIRAS-v3 at , illustrating the viability of ML approaches for radiation forecasting in aviation. These ML-ready datasets provide a benchmark and a flexible foundation for further ML-driven nowcasting and forecasting, potentially accelerating improvements beyond physics-based models.

Abstract

Nowcasting and forecasting of the radiation environment in the Earth's lower atmosphere are critical for the safety of aircraft and spacecraft crews and passengers. Currently, this problem is addressed by employing statistical and physics-based models that take into account particle transport and precipitation. However, given the increased number of radiation measurements available to the community, it is possible to start developing data-driven approaches. We prepared Machine Learning-ready (ML-ready) datasets to nowcast the effective dose rates at aviation altitudes. The presented datasets contain 92,476 individual measurements from 589 flights obtained by the Automated Radiation Measurements for Aerospace Safety (ARMAS) experiment from 2013 to 2023. The ARMAS measurements are augmented with the properties of the Geospace environment, such as solar soft X-ray and proton fluxes, solar wind properties, secondary cosmic ray neutrons, space weather indexes, and global solar activity indicators (such as daily sunspot number). ARMAS data are separated into three partitions, ensuring that (1) the data points from a single flight remain within the same partition, and (2) each partition samples the flight locations and Geospace environment conditions equally. Several versions of the datasets allow predictions based on point-in-time measurements and use up to 24 hours of Geospace parameter history. The test of the use case demonstrates a possibility of nowcasting ARMAS measurements with accuracies slightly better than the considered physics-based models. The publicly available ML-ready datasets could serve as the first step in data preparation for ML-driven nowcasting and forecasting of the radiation environment.
Paper Structure (13 sections, 1 equation, 8 figures)

This paper contains 13 sections, 1 equation, 8 figures.

Figures (8)

  • Figure 1: Coverage of considered ARMAS flight measurements over the Earth globe. The individual measurements are marked as orange points on the map.
  • Figure 2: Evolution of some selected Geospace environment parameters from June 2013 until December 2024, from top to bottom: daily sunspot number, neutron monitor corrected counts from four stations considered, integrated soft X-ray fluxes, energetic proton fluxes, geomagnetic indexes (Kp and Dst), and solar wind velocities at L1. Gray lines in the background represent the time moments covered by the ARMAS flight measurements.
  • Figure 3: (a) Schematic structure of the ML-ready dataset entity. A target corresponds to the measurement of radiation dose rate during the ARMAS flight, and the feature vector represents the flight timing and coordinates, NAIRAS predictions, and prehistory of the measurements of the environment. (b) Illustration of the subdivision of ARMAS flights into partitions.
  • Figure 4: Distribution of the relative KL divergence and the relative standard deviation of the distribution widths for the distributions across three dataset partitions as a function of the number of GMM clusters. The vertical dashed line displays the number of clusters selected for the final dataset construction.
  • Figure 5: Distribution of the parameters used for clustering the data points (barometric altitude, geomagnetic longitude, geomagnetic latitude, daily sunspot number, and Dst index) within each partition of the dataset. Each row corresponds to a single parameter. The partition is indicated in the header of each column.
  • ...and 3 more figures