Theoretical Analysis of Explicit Averaging and Novel Sign Averaging in Comparison-Based Search
Daiki Morinaga, Youhei Akimoto
TL;DR
The paper tackles the impact of noise on comparison-based black-box optimization, revealing that explicit averaging can harm ground-truth rankings when the noise is heavy-tailed and may even fail when the mean does not exist. It establishes a theoretical framework under stable distributions to quantify order estimation probability (OEP) and proves that explicit averaging is effective only for $\alpha\in(1,2]$, neutral at $\alpha=1$, and detrimental for $0<\alpha<1$. To address these limitations, the authors introduce sign averaging, proving that estimating the order of medians via sign comparisons remains reliable for all $\alpha\in(0,2]$ under symmetry and uniqueness assumptions, and they propose a practical weighting scheme to incorporate sign averaging into CMA-ES. Numerical experiments validate the theory, showing sign averaging often outperforms explicit averaging, especially for heavy-tailed noise, and demonstrate how the proposed weighting can leverage ranking information for robust optimization in noisy, comparison-based settings.
Abstract
In black-box optimization, noise in the objective function is inevitable. Noise disrupts the ranking of candidate solutions in comparison-based optimization, possibly deteriorating the search performance compared with a noiseless scenario. Explicit averaging takes the sample average of noisy objective function values and is widely used as a simple and versatile noise-handling technique. Although it is suitable for various applications, it is ineffective if the mean is not finite. We theoretically reveal that explicit averaging has a negative effect on the estimation of ground-truth rankings when assuming stably distributed noise without a finite mean. Alternatively, sign averaging is proposed as a simple but robust noise-handling technique. We theoretically prove that the sign averaging estimates the order of the medians of the noisy objective function values of a pair of points with arbitrarily high probability as the number of samples increases. Its advantages over explicit averaging and its robustness are also confirmed through numerical experiments.
