Runtime Analysis of the Compact Genetic Algorithm on the LeadingOnes Benchmark
Marcel Chwiałkowski, Benjamin Doerr, Martin S. Krejca
TL;DR
This work provides the first rigorous runtime analysis of the compact genetic algorithm (cGA) on the LeadingOnes benchmark. Using drift-analysis techniques, including multiplicative and negative drift theorems, the authors prove that in the low-genetic-drift regime (with $\mu \ge \Omega(n \log^2 n)$) the cGA finds the optimum with high probability in $O(\mu n \log n)$ iterations, achieving an overall time of $O(n^2 \log^3 n)$ when $\mu = \Theta(n \log^2 n)$. The analysis reveals nuanced differences between the cGA and UMDA dynamics, notably due to the cGA’s two-sample update, which leads to a slightly slower adaptation despite similar high-level behavior. The results underscore the impact of sample size on drift and optimization speed, while leaving open the question of a matching lower bound. Overall, the paper advances theoretical understanding of univariate EDAs on classic benchmarks and informs parameter choices for the cGA on LeadingOnes.
Abstract
The compact genetic algorithm (cGA) is one of the simplest estimation-of-distribution algorithms (EDAs). Next to the univariate marginal distribution algorithm (UMDA) -- another simple EDA -- , the cGA has been subject to extensive mathematical runtime analyses, often showcasing a similar or even superior performance to competing approaches. Surprisingly though, up to date and in contrast to the UMDA and many other heuristics, we lack a rigorous runtime analysis of the cGA on the LeadingOnes benchmark -- one of the most studied theory benchmarks in the domain of evolutionary computation. We fill this gap in the literature by conducting a formal runtime analysis of the cGA on LeadingOnes. For the cGA's single parameter -- called the hypothetical population size -- at least polylogarithmically larger than the problem size, we prove that the cGA samples the optimum of LeadingOnes with high probability within a number of function evaluations quasi-linear in the problem size and linear in the hypothetical population size. For the best hypothetical population size, our result matches, up to polylogarithmic factors, the typical quadratic runtime that many randomized search heuristics exhibit on LeadingOnes. Our analysis exhibits some noteworthy differences in the working principles of the two algorithms which were not visible in previous works.
