Table of Contents
Fetching ...

Using ML-based Regression Techniques to Mitigate GOES Energetic Proton Flux Data Contamination and Magnetospheric Effects

Aatiya Ali, Viacheslav Sadykov

TL;DR

GOES proton flux measurements in the 10–50 MeV range are contaminated by high-energy protons, biasing SEP analyses. The study trains ML regression models (RF, XGBoost, MLP) to predict concurrent, cleaner EPHIN flux measurements at L1 from GOES fluxes and geomagnetic indices, using CC and non-CC events to build training data. RF, especially with geomagnetic inputs, best reconstructs uncontaminated flux profiles (R^2 ≈ 0.83, RMSE ≈ 4.1×10^4 pfu^2), offering a practical data-correction pathway for GOES data. This approach improves SEP data reliability and forecasting potential for near-Earth and lunar mission planning by providing corrected flux proxies for SPE analysis.

Abstract

Positioned at geostationary orbit (GEO) ~36,000 km above Earth, NOAA's GOES series has recorded real-time energetic proton flux measurements crucial for space weather monitoring for over three decades. Although machine learning models have advanced solar energetic particle (SEP) event prediction using GOES data, the sudden yet sparse nature of SEP events necessitates high-quality proton flux measurements. Previous studies have identified contamination issues in GOES data, when the presence of higher-energy protons can cause parasitic signals in lower-energy GOES channels and lead to artificially elevated fluxes in lower energy ranges (e.g., 10 - 50 MeV). As of now, no universal correction method has been implemented for the publicly available NOAA data. In addition, the effects of Earth's magnetosphere on the 10 - 50 MeV particles are not fully understood yet. This study assesses a reconstruction method using concurrent solar proton event (SPE) measurements from SOHO-EPHIN, which align well with GOES measurements of SPEs across solar cycles 23 and the bulk of cycle 24, but represent the off-magnetospheric environment of the Lagrange 1 point. We train regression models on GOES proton fluxes across multiple energy bins, employing EPHIN fluxes as prediction targets. We expect that similar approaches can allow us to derive non-contaminated flux proxies that preserve valuable data and more accurately capture the characteristics of SPEs, providing a more stable dataset for analyzing SEP behavior and potentially improving SEP event prediction models.

Using ML-based Regression Techniques to Mitigate GOES Energetic Proton Flux Data Contamination and Magnetospheric Effects

TL;DR

GOES proton flux measurements in the 10–50 MeV range are contaminated by high-energy protons, biasing SEP analyses. The study trains ML regression models (RF, XGBoost, MLP) to predict concurrent, cleaner EPHIN flux measurements at L1 from GOES fluxes and geomagnetic indices, using CC and non-CC events to build training data. RF, especially with geomagnetic inputs, best reconstructs uncontaminated flux profiles (R^2 ≈ 0.83, RMSE ≈ 4.1×10^4 pfu^2), offering a practical data-correction pathway for GOES data. This approach improves SEP data reliability and forecasting potential for near-Earth and lunar mission planning by providing corrected flux proxies for SPE analysis.

Abstract

Positioned at geostationary orbit (GEO) ~36,000 km above Earth, NOAA's GOES series has recorded real-time energetic proton flux measurements crucial for space weather monitoring for over three decades. Although machine learning models have advanced solar energetic particle (SEP) event prediction using GOES data, the sudden yet sparse nature of SEP events necessitates high-quality proton flux measurements. Previous studies have identified contamination issues in GOES data, when the presence of higher-energy protons can cause parasitic signals in lower-energy GOES channels and lead to artificially elevated fluxes in lower energy ranges (e.g., 10 - 50 MeV). As of now, no universal correction method has been implemented for the publicly available NOAA data. In addition, the effects of Earth's magnetosphere on the 10 - 50 MeV particles are not fully understood yet. This study assesses a reconstruction method using concurrent solar proton event (SPE) measurements from SOHO-EPHIN, which align well with GOES measurements of SPEs across solar cycles 23 and the bulk of cycle 24, but represent the off-magnetospheric environment of the Lagrange 1 point. We train regression models on GOES proton fluxes across multiple energy bins, employing EPHIN fluxes as prediction targets. We expect that similar approaches can allow us to derive non-contaminated flux proxies that preserve valuable data and more accurately capture the characteristics of SPEs, providing a more stable dataset for analyzing SEP behavior and potentially improving SEP event prediction models.

Paper Structure

This paper contains 7 sections, 6 figures, 1 table.

Figures (6)

  • Figure 1: Differences in SPE start times observed by GOES and EPHIN across SCs 23 & 24. Each bar represents a single SPE, with the x-axis enumerating the events. Bar colors indicate which instrument detected the earlier time, while the background reflects the corresponding SC. Instances where bars are absent (e.g., onset for event #3) indicate no measurable offset, i.e., both instruments detected the property simultaneously ($\Delta$ = 0 hrs).
  • Figure 2: Comparison of GOES peak fluxes (left) and fluences (right) with EPHIN during SPEs across SCs 23 & 24.
  • Figure 3: (a) An SPE where the 10 – 50 MeV proton flux profile closely mirrors the signals in the higher-energy GOES channels, indicating potential contamination. (b) An SPE showing no obvious contamination signatures, with proton channels above 50 MeV showing significantly lower signals compared to the 10 – 50 MeV channel, as expected.
  • Figure 4: Comparison of MLP, XGBoost, and RF flux predictions with fluxes observed by GOES and EPHIN during the SPE on 01/16/2005.
  • Figure 5: Comparison of regression model performance in reconstructing 'clean' SEP flux profiles from GOES data. Results are shown for three machine learning models (MLP, XGBoost, and RF) under two scenarios: with and without geomagnetic indices included as additional input features. For context, baseline results from a non-ML prediction method are also included.
  • ...and 1 more figures