Evolutionary Algorithms Are Significantly More Robust to Noise When They Ignore It
Denis Antipov, Benjamin Doerr
TL;DR
The paper addresses optimization under noisy objective evaluations using no re-evaluations, focusing on the Leading-Ones benchmark. It provides rigorous runtime analyses for the no-re-evaluation $(1+1)$ EA under the one-bit and bitwise noise models, yielding $O(n^2)$-time guarantees for constant noise levels and standard mutation settings. The findings contrast with prior analyses that assume re-evaluations, showing that re-evaluations can be detrimental and are not strictly necessary. Complementary experiments support the theory, demonstrating robustness close to the noiseless case even at substantial noise. The results suggest broader applicability to single-objective optimization and motivate rethinking evaluation strategies in noisy black-box settings.
Abstract
Randomized search heuristics (RSHs) are known to have a certain robustness to noise. Mathematical analyses trying to quantify rigorously how robust RSHs are to a noisy access to the objective function typically assume that each solution is re-evaluated whenever it is compared to others. This aims at preventing that a single noisy evaluation has a lasting negative effect, but is computationally expensive and requires the user to foresee that noise is present (as in a noise-free setting, one would never re-evaluate solutions). In this work, we conduct the first mathematical runtime analysis of an evolutionary algorithm solving a single-objective noisy problem without re-evaluations. We prove that the $(1+1)$ evolutionary algorithm without re-evaluations can optimize the classic LeadingOnes benchmark with up to constant noise rates, in sharp contrast to the version with re-evaluations, where only noise with rates $O(n^{-2} \log n)$ can be tolerated. This result suggests that re-evaluations are much less needed than what was previously thought, and that they actually can be highly detrimental. The insights from our mathematical proofs indicate that this similar results are plausible for other classic benchmarks.
