Table of Contents
Fetching ...

A Novel Ranking Scheme for the Performance Analysis of Stochastic Optimization Algorithms using the Principles of Severity

Sowmya Chandrasekaran, Thomas Bartz-Beielstein

TL;DR

This work addresses robust performance comparison of stochastic optimization algorithms across multiple problems under uncertainty. It introduces a football-league–style ranking powered by a distribution-free bootstrap-based hypothesis testing framework that uses severity $S ∈ [0,1]$ and a practical relevance threshold $delta_p$ to integrate significance and magnitude. Pairwise algorithm comparisons yield points and a goal difference (GD) to rank algorithms, with BH-corrected p-values guiding decisions. Case study on the PBO Suite demonstrates results comparable to classical HT while offering additional interpretability via GD and practical significance weighting, suggesting broad applicability to ML/AI benchmarking.

Abstract

Stochastic optimization algorithms have been successfully applied in several domains to find optimal solutions. Because of the ever-growing complexity of the integrated systems, novel stochastic algorithms are being proposed, which makes the task of the performance analysis of the algorithms extremely important. In this paper, we provide a novel ranking scheme to rank the algorithms over multiple single-objective optimization problems. The results of the algorithms are compared using a robust bootstrapping-based hypothesis testing procedure that is based on the principles of severity. Analogous to the football league scoring scheme, we propose pairwise comparison of algorithms as in league competition. Each algorithm accumulates points and a performance metric of how good or bad it performed against other algorithms analogous to goal differences metric in football league scoring system. The goal differences performance metric can not only be used as a tie-breaker but also be used to obtain a quantitative performance of each algorithm. The key novelty of the proposed ranking scheme is that it takes into account the performance of each algorithm considering the magnitude of the achieved performance improvement along with its practical relevance and does not have any distributional assumptions. The proposed ranking scheme is compared to classical hypothesis testing and the analysis of the results shows that the results are comparable and our proposed ranking showcases many additional benefits.

A Novel Ranking Scheme for the Performance Analysis of Stochastic Optimization Algorithms using the Principles of Severity

TL;DR

This work addresses robust performance comparison of stochastic optimization algorithms across multiple problems under uncertainty. It introduces a football-league–style ranking powered by a distribution-free bootstrap-based hypothesis testing framework that uses severity and a practical relevance threshold to integrate significance and magnitude. Pairwise algorithm comparisons yield points and a goal difference (GD) to rank algorithms, with BH-corrected p-values guiding decisions. Case study on the PBO Suite demonstrates results comparable to classical HT while offering additional interpretability via GD and practical significance weighting, suggesting broad applicability to ML/AI benchmarking.

Abstract

Stochastic optimization algorithms have been successfully applied in several domains to find optimal solutions. Because of the ever-growing complexity of the integrated systems, novel stochastic algorithms are being proposed, which makes the task of the performance analysis of the algorithms extremely important. In this paper, we provide a novel ranking scheme to rank the algorithms over multiple single-objective optimization problems. The results of the algorithms are compared using a robust bootstrapping-based hypothesis testing procedure that is based on the principles of severity. Analogous to the football league scoring scheme, we propose pairwise comparison of algorithms as in league competition. Each algorithm accumulates points and a performance metric of how good or bad it performed against other algorithms analogous to goal differences metric in football league scoring system. The goal differences performance metric can not only be used as a tie-breaker but also be used to obtain a quantitative performance of each algorithm. The key novelty of the proposed ranking scheme is that it takes into account the performance of each algorithm considering the magnitude of the achieved performance improvement along with its practical relevance and does not have any distributional assumptions. The proposed ranking scheme is compared to classical hypothesis testing and the analysis of the results shows that the results are comparable and our proposed ranking showcases many additional benefits.
Paper Structure (8 sections, 4 equations, 3 figures, 4 tables, 1 algorithm)

This paper contains 8 sections, 4 equations, 3 figures, 4 tables, 1 algorithm.

Figures (3)

  • Figure 1: Illustration of two scenarios of $S_{r}$ under the alternate hypothesis. In both cases, the actual test statistic $d(\mathbf{x})$, falls outside the $u_{1-\alpha}$, the decision is to reject the null. The $S_{r}$ is the area under the $H_1$ that is within the $d(\mathbf{x})$ (area shaded in blue). Though in both cases, the decision is the same, severity sheds light in understanding the actual attained power of the test. In (a), less support for the decision won (area shaded in blue) as $d(\mathbf{x})$ is closer to the cut-off point and in (b), more support for the won (area shaded in blue) as $d(\mathbf{x})$ is way more from the cut-off point.
  • Figure 2: Function-wise ranking metrics of the Rank 1:(1+1)EA and Rank 2: MIES-ERT Algorithms
  • Figure 3: Proposed Ranking Scheme: Distribution of the points attained by each algorithm for all 25 problems.