Using ML-based Regression Techniques to Mitigate GOES Energetic Proton Flux Data Contamination and Magnetospheric Effects
Aatiya Ali, Viacheslav Sadykov
TL;DR
GOES proton flux measurements in the 10–50 MeV range are contaminated by high-energy protons, biasing SEP analyses. The study trains ML regression models (RF, XGBoost, MLP) to predict concurrent, cleaner EPHIN flux measurements at L1 from GOES fluxes and geomagnetic indices, using CC and non-CC events to build training data. RF, especially with geomagnetic inputs, best reconstructs uncontaminated flux profiles (R^2 ≈ 0.83, RMSE ≈ 4.1×10^4 pfu^2), offering a practical data-correction pathway for GOES data. This approach improves SEP data reliability and forecasting potential for near-Earth and lunar mission planning by providing corrected flux proxies for SPE analysis.
Abstract
Positioned at geostationary orbit (GEO) ~36,000 km above Earth, NOAA's GOES series has recorded real-time energetic proton flux measurements crucial for space weather monitoring for over three decades. Although machine learning models have advanced solar energetic particle (SEP) event prediction using GOES data, the sudden yet sparse nature of SEP events necessitates high-quality proton flux measurements. Previous studies have identified contamination issues in GOES data, when the presence of higher-energy protons can cause parasitic signals in lower-energy GOES channels and lead to artificially elevated fluxes in lower energy ranges (e.g., 10 - 50 MeV). As of now, no universal correction method has been implemented for the publicly available NOAA data. In addition, the effects of Earth's magnetosphere on the 10 - 50 MeV particles are not fully understood yet. This study assesses a reconstruction method using concurrent solar proton event (SPE) measurements from SOHO-EPHIN, which align well with GOES measurements of SPEs across solar cycles 23 and the bulk of cycle 24, but represent the off-magnetospheric environment of the Lagrange 1 point. We train regression models on GOES proton fluxes across multiple energy bins, employing EPHIN fluxes as prediction targets. We expect that similar approaches can allow us to derive non-contaminated flux proxies that preserve valuable data and more accurately capture the characteristics of SPEs, providing a more stable dataset for analyzing SEP behavior and potentially improving SEP event prediction models.
