Table of Contents
Fetching ...

On the Fair Comparison of Optimization Algorithms in Different Machines

Etor Arza, Josu Ceberio, Ekhiñe Irurozki, Aritz Pérez

TL;DR

This work tackles fair benchmarking of optimization algorithms across different machines when access to all code is not available. It introduces an estimation framework that predicts the equivalent runtime $t_2$ on a target machine using a reference process and machine scores, incorporating a bias-correction parameter $\gamma$ to bound overestimation probability $p_\gamma$. A modified one-sided sign test with a corrected p-value $\hat{p}_c$ is proposed to control type I error under these runtime estimations. Validation across multiple CPUs and optimization tasks demonstrates accurate runtime prediction and robust statistical conclusions. The methodology enables principled cross-machine comparisons and is complemented by practical scripts and tutorials for researchers to apply it without re-running the original algorithms.

Abstract

An experimental comparison of two or more optimization algorithms requires the same computational resources to be assigned to each algorithm. When a maximum runtime is set as the stopping criterion, all algorithms need to be executed in the same machine if they are to use the same resources. Unfortunately, the implementation code of the algorithms is not always available, which means that running the algorithms to be compared in the same machine is not always possible. And even if they are available, some optimization algorithms might be costly to run, such as training large neural-networks in the cloud. In this paper, we consider the following problem: how do we compare the performance of a new optimization algorithm B with a known algorithm A in the literature if we only have the results (the objective values) and the runtime in each instance of algorithm A? Particularly, we present a methodology that enables a statistical analysis of the performance of algorithms executed in different machines. The proposed methodology has two parts. First, we propose a model that, given the runtime of an algorithm in a machine, estimates the runtime of the same algorithm in another machine. This model can be adjusted so that the probability of estimating a runtime longer than what it should be is arbitrarily low. Second, we introduce an adaptation of the one-sided sign test that uses a modified p-value and takes into account that probability. Such adaptation avoids increasing the probability of type I error associated with executing algorithms A and B in different machines.

On the Fair Comparison of Optimization Algorithms in Different Machines

TL;DR

This work tackles fair benchmarking of optimization algorithms across different machines when access to all code is not available. It introduces an estimation framework that predicts the equivalent runtime on a target machine using a reference process and machine scores, incorporating a bias-correction parameter to bound overestimation probability . A modified one-sided sign test with a corrected p-value is proposed to control type I error under these runtime estimations. Validation across multiple CPUs and optimization tasks demonstrates accurate runtime prediction and robust statistical conclusions. The methodology enables principled cross-machine comparisons and is complemented by practical scripts and tutorials for researchers to apply it without re-running the original algorithms.

Abstract

An experimental comparison of two or more optimization algorithms requires the same computational resources to be assigned to each algorithm. When a maximum runtime is set as the stopping criterion, all algorithms need to be executed in the same machine if they are to use the same resources. Unfortunately, the implementation code of the algorithms is not always available, which means that running the algorithms to be compared in the same machine is not always possible. And even if they are available, some optimization algorithms might be costly to run, such as training large neural-networks in the cloud. In this paper, we consider the following problem: how do we compare the performance of a new optimization algorithm B with a known algorithm A in the literature if we only have the results (the objective values) and the runtime in each instance of algorithm A? Particularly, we present a methodology that enables a statistical analysis of the performance of algorithms executed in different machines. The proposed methodology has two parts. First, we propose a model that, given the runtime of an algorithm in a machine, estimates the runtime of the same algorithm in another machine. This model can be adjusted so that the probability of estimating a runtime longer than what it should be is arbitrarily low. Second, we introduce an adaptation of the one-sided sign test that uses a modified p-value and takes into account that probability. Such adaptation avoids increasing the probability of type I error associated with executing algorithms A and B in different machines.
Paper Structure (25 sections, 4 theorems, 53 equations, 8 figures, 7 tables, 1 algorithm)

This paper contains 25 sections, 4 theorems, 53 equations, 8 figures, 7 tables, 1 algorithm.

Key Result

Lemma 1

Let $n$ be an integer, $X$ and $Y$ two random variables. Let $X_1,...,X_n$ be $n$ independent random variables distributed as $X$. Let $Y_1,...,Y_n$ be $n$ independent random variables distributed as $Y$. Let $v_x$ and $v_y$ be two possible outcomes of the random variables $X$ and $Y$ respectively, and II) If $\mathcal{P}[Y = v_y \ | \ X = v_x] = 1$ and $\mathcal{P}[X = v_x \ | \ Y = v_y] = 1$ t

Figures (8)

  • Figure 4: A comparison in estimation error of the predicted equivalent runtime (Equivalent runtime) and simply using the same runtime in both machines (Same runtime) with respect to the true equivalent runtime. The estimation error is measured as the log deviation ratio of the prediction of the equivalent runtime with respect to the true equivalent runtime. A value closer to 0 indicates a lower prediction error.
  • Figure 5: A comparison in estimation error of the equivalent runtime with the centered estimator. The estimation error for the optimization processes and CPUs used to fit the estimator (Train), and these new validation optimization processes and CPUs (Validation) are compared. The estimation error is measured as the log deviation ratio of the prediction of the equivalent runtime with respect to the true equivalent runtime. A value closer to 0 indicates a lower prediction error.
  • Figure : $\space$Diagram of the estimation of the equivalent runtime
  • Figure : PassMark single-thread score and the runtime $\rho'$
  • Figure : $\space$Estimated runtime and the correction parameter $\gamma$
  • ...and 3 more figures

Theorems & Definitions (13)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Definition 6
  • Lemma 1
  • Lemma 2
  • proof
  • Lemma 3
  • ...and 3 more