Effectiveness of High-Dimensional Distance Metrics on Solar Flare Time Series
Elaina Rohlfing, Azim Ahmadzadeh, V Aparna
TL;DR
This study interrogates whether elastic, high-dimensional distance measures (DTW, MSM, TWE) offer advantages over the Euclidean baseline for clustering-based solar flare forecasting on the SWAN-SF multivariate time-series dataset. Using a $k$-medoids framework with careful initialization, cluster-to-label mapping, and extensive hyperparameter tuning, the authors show that elastic measures fail to surpass Euclidean distance, even after optimizing parameters such as warping windows and transformation costs. A detailed analysis reveals that test-set imbalance and the stochastic nature of pre-flare activity limit the discriminative power of FL (X/M) clusters, with NF clusters often dominating performance gains. The findings motivate developing non–point-matching high-dimensional distance measures for time-series, as Euclidean distance remains a robust baseline for this domain and current elastic metrics offer limited added value.
Abstract
Solar-flare forecasting has been extensively researched yet remains an open problem. In this paper, we investigate the contributions of elastic distance measures for detecting patterns in the solar-flare dataset, SWAN-SF. We employ a simple $k$-medoids clustering algorithm to evaluate the effectiveness of advanced, high-dimensional distance metrics. Our results show that, despite thorough optimization, none of the elastic distances outperform Euclidean distance by a significant margin. We demonstrate that, although elastic measures have shown promise for univariate time series, when applied to the multivariate time series of SWAN-SF, characterized by the high stochasticity of solar activity, they effectively collapse to Euclidean distance. We conduct thousands of experiments and present both quantitative and qualitative evidence supporting this finding.
