Table of Contents
Fetching ...

Effectiveness of High-Dimensional Distance Metrics on Solar Flare Time Series

Elaina Rohlfing, Azim Ahmadzadeh, V Aparna

TL;DR

This study interrogates whether elastic, high-dimensional distance measures (DTW, MSM, TWE) offer advantages over the Euclidean baseline for clustering-based solar flare forecasting on the SWAN-SF multivariate time-series dataset. Using a $k$-medoids framework with careful initialization, cluster-to-label mapping, and extensive hyperparameter tuning, the authors show that elastic measures fail to surpass Euclidean distance, even after optimizing parameters such as warping windows and transformation costs. A detailed analysis reveals that test-set imbalance and the stochastic nature of pre-flare activity limit the discriminative power of FL (X/M) clusters, with NF clusters often dominating performance gains. The findings motivate developing non–point-matching high-dimensional distance measures for time-series, as Euclidean distance remains a robust baseline for this domain and current elastic metrics offer limited added value.

Abstract

Solar-flare forecasting has been extensively researched yet remains an open problem. In this paper, we investigate the contributions of elastic distance measures for detecting patterns in the solar-flare dataset, SWAN-SF. We employ a simple $k$-medoids clustering algorithm to evaluate the effectiveness of advanced, high-dimensional distance metrics. Our results show that, despite thorough optimization, none of the elastic distances outperform Euclidean distance by a significant margin. We demonstrate that, although elastic measures have shown promise for univariate time series, when applied to the multivariate time series of SWAN-SF, characterized by the high stochasticity of solar activity, they effectively collapse to Euclidean distance. We conduct thousands of experiments and present both quantitative and qualitative evidence supporting this finding.

Effectiveness of High-Dimensional Distance Metrics on Solar Flare Time Series

TL;DR

This study interrogates whether elastic, high-dimensional distance measures (DTW, MSM, TWE) offer advantages over the Euclidean baseline for clustering-based solar flare forecasting on the SWAN-SF multivariate time-series dataset. Using a -medoids framework with careful initialization, cluster-to-label mapping, and extensive hyperparameter tuning, the authors show that elastic measures fail to surpass Euclidean distance, even after optimizing parameters such as warping windows and transformation costs. A detailed analysis reveals that test-set imbalance and the stochastic nature of pre-flare activity limit the discriminative power of FL (X/M) clusters, with NF clusters often dominating performance gains. The findings motivate developing non–point-matching high-dimensional distance measures for time-series, as Euclidean distance remains a robust baseline for this domain and current elastic metrics offer limited added value.

Abstract

Solar-flare forecasting has been extensively researched yet remains an open problem. In this paper, we investigate the contributions of elastic distance measures for detecting patterns in the solar-flare dataset, SWAN-SF. We employ a simple -medoids clustering algorithm to evaluate the effectiveness of advanced, high-dimensional distance metrics. Our results show that, despite thorough optimization, none of the elastic distances outperform Euclidean distance by a significant margin. We demonstrate that, although elastic measures have shown promise for univariate time series, when applied to the multivariate time series of SWAN-SF, characterized by the high stochasticity of solar activity, they effectively collapse to Euclidean distance. We conduct thousands of experiments and present both quantitative and qualitative evidence supporting this finding.

Paper Structure

This paper contains 17 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Illustration of alignment (point mapping) of DTW (left), MSM (middle), and TWE (right), when path is constricted by Sakoe-Chiba band ($w_{sc}=0.2$).
  • Figure 2: Partition 2 validation results for $k$-medoids. Each plot shows TSS and HSS scores for three initialization methods and one distance measure.
  • Figure 3: TSS and HSS scores for Partition 2 (validation partition) for $k = \{2,3,\dots,100\}$
  • Figure 4: Visualization of point mapping for DTW applied on two times series of SWAN-SF with label FL, parameters are TOTPOT and TOTUSJH.