On the Fair Comparison of Optimization Algorithms in Different Machines

Etor Arza; Josu Ceberio; Ekhiñe Irurozki; Aritz Pérez

On the Fair Comparison of Optimization Algorithms in Different Machines

Etor Arza, Josu Ceberio, Ekhiñe Irurozki, Aritz Pérez

TL;DR

This work tackles fair benchmarking of optimization algorithms across different machines when access to all code is not available. It introduces an estimation framework that predicts the equivalent runtime $t_2$ on a target machine using a reference process and machine scores, incorporating a bias-correction parameter $\gamma$ to bound overestimation probability $p_\gamma$. A modified one-sided sign test with a corrected p-value $\hat{p}_c$ is proposed to control type I error under these runtime estimations. Validation across multiple CPUs and optimization tasks demonstrates accurate runtime prediction and robust statistical conclusions. The methodology enables principled cross-machine comparisons and is complemented by practical scripts and tutorials for researchers to apply it without re-running the original algorithms.

Abstract

An experimental comparison of two or more optimization algorithms requires the same computational resources to be assigned to each algorithm. When a maximum runtime is set as the stopping criterion, all algorithms need to be executed in the same machine if they are to use the same resources. Unfortunately, the implementation code of the algorithms is not always available, which means that running the algorithms to be compared in the same machine is not always possible. And even if they are available, some optimization algorithms might be costly to run, such as training large neural-networks in the cloud. In this paper, we consider the following problem: how do we compare the performance of a new optimization algorithm B with a known algorithm A in the literature if we only have the results (the objective values) and the runtime in each instance of algorithm A? Particularly, we present a methodology that enables a statistical analysis of the performance of algorithms executed in different machines. The proposed methodology has two parts. First, we propose a model that, given the runtime of an algorithm in a machine, estimates the runtime of the same algorithm in another machine. This model can be adjusted so that the probability of estimating a runtime longer than what it should be is arbitrarily low. Second, we introduce an adaptation of the one-sided sign test that uses a modified p-value and takes into account that probability. Such adaptation avoids increasing the probability of type I error associated with executing algorithms A and B in different machines.

On the Fair Comparison of Optimization Algorithms in Different Machines

TL;DR

on a target machine using a reference process and machine scores, incorporating a bias-correction parameter

to bound overestimation probability

. A modified one-sided sign test with a corrected p-value

is proposed to control type I error under these runtime estimations. Validation across multiple CPUs and optimization tasks demonstrates accurate runtime prediction and robust statistical conclusions. The methodology enables principled cross-machine comparisons and is complemented by practical scripts and tutorials for researchers to apply it without re-running the original algorithms.

Abstract

Paper Structure (25 sections, 4 theorems, 53 equations, 8 figures, 7 tables, 1 algorithm)

This paper contains 25 sections, 4 theorems, 53 equations, 8 figures, 7 tables, 1 algorithm.

Introduction
The estimation model of the equivalent runtime
Controlling the probability of predicting a runtime longer than the true equivalent runtime
Validation
i) Predicting the equivalent runtime vs. using the same runtime
ii) Validation in other optimization processes and CPUs
Modifying the one-sided sign test
One-sided sign test
The corrected p-value
Applying the methodology
Example I
Example II
Limitations, applicability and future work
Multiple threads/cores
CPU as the only bottleneck
...and 10 more sections

Key Result

Lemma 1

Let $n$ be an integer, $X$ and $Y$ two random variables. Let $X_1,...,X_n$ be $n$ independent random variables distributed as $X$. Let $Y_1,...,Y_n$ be $n$ independent random variables distributed as $Y$. Let $v_x$ and $v_y$ be two possible outcomes of the random variables $X$ and $Y$ respectively, and II) If $\mathcal{P}[Y = v_y \ | \ X = v_x] = 1$ and $\mathcal{P}[X = v_x \ | \ Y = v_y] = 1$ t

Figures (8)

Figure 4: A comparison in estimation error of the predicted equivalent runtime (Equivalent runtime) and simply using the same runtime in both machines (Same runtime) with respect to the true equivalent runtime. The estimation error is measured as the log deviation ratio of the prediction of the equivalent runtime with respect to the true equivalent runtime. A value closer to 0 indicates a lower prediction error.
Figure 5: A comparison in estimation error of the equivalent runtime with the centered estimator. The estimation error for the optimization processes and CPUs used to fit the estimator (Train), and these new validation optimization processes and CPUs (Validation) are compared. The estimation error is measured as the log deviation ratio of the prediction of the equivalent runtime with respect to the true equivalent runtime. A value closer to 0 indicates a lower prediction error.
Figure : $\space$Diagram of the estimation of the equivalent runtime
Figure : PassMark single-thread score and the runtime $\rho'$
Figure : $\space$Estimated runtime and the correction parameter $\gamma$
...and 3 more figures

Theorems & Definitions (13)

Definition 1
Definition 2
Definition 3
Definition 4
Definition 5
Definition 6
Lemma 1
Lemma 2
proof
Lemma 3
...and 3 more

On the Fair Comparison of Optimization Algorithms in Different Machines

TL;DR

Abstract

On the Fair Comparison of Optimization Algorithms in Different Machines

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (13)