Bayesian Optimization in AlphaGo
Yutian Chen, Aja Huang, Ziyu Wang, Ioannis Antonoglou, Julian Schrittwieser, David Silver, Nando de Freitas
TL;DR
Bayesian optimization was applied to tune AlphaGo's game-playing hyper-parameters across design iterations, yielding progressive strength gains and contributing to the match performance against Lee Sedol. The methods use Gaussian-process priors and Expected Improvement via Spearmint, modeling self-play win-rate with 50 games per evaluation to handle non-differentiable, expensive evaluations. Applied to five tuning tasks, the approach delivered substantial Elo gains and offered insights into parameter interactions and component contributions (e.g., fast roll-outs vs. value networks), with gains compounding over iterations. This work demonstrates a practical, data-efficient strategy for optimizing complex, non-differentiable hyper-parameter spaces in large reinforcement learning systems and informs the development of future self-play agents.
Abstract
During the development of AlphaGo, its many hyper-parameters were tuned with Bayesian optimization multiple times. This automatic tuning process resulted in substantial improvements in playing strength. For example, prior to the match with Lee Sedol, we tuned the latest AlphaGo agent and this improved its win-rate from 50% to 66.5% in self-play games. This tuned version was deployed in the final match. Of course, since we tuned AlphaGo many times during its development cycle, the compounded contribution was even higher than this percentage. It is our hope that this brief case study will be of interest to Go fans, and also provide Bayesian optimization practitioners with some insights and inspiration.
