Batch Acquisition Function Evaluations and Decouple Optimizer Updates for Faster Bayesian Optimization
Kaichi Irie, Shuhei Watanabe, Masaki Onishi
TL;DR
The paper identifies a bottleneck in Bayesian optimization: accelerating acquisition-function optimization across multiple restarts via batching (C-BE) introduces off-diagonal artifacts in the inverse Hessian that slow convergence. It proposes Decoupled Batch Evaluations (D-BE), which uses a coroutine to decouple per-restart quasi-Newton updates from batched evaluations, preserving per-restart curvature while leveraging hardware throughput. The method achieves identical convergence to sequential MSO with substantially reduced wall-clock time, outperforming C-BE, and is demonstrated across multiple benchmark functions, with notable speedups up to 1.5x. The approach has been merged into GPSampler in Optuna, delivering practical, deployable speedups for Bayesian optimization workflows.
Abstract
Bayesian optimization (BO) efficiently finds high-performing parameters by maximizing an acquisition function, which models the promise of parameters. A major computational bottleneck arises in acquisition function optimization, where multi-start optimization (MSO) with quasi-Newton (QN) methods is required due to the non-convexity of the acquisition function. BoTorch, a widely used BO library, currently optimizes the summed acquisition function over multiple points, leading to the speedup of MSO owing to PyTorch batching. Nevertheless, this paper empirically demonstrates the suboptimality of this approach in terms of off-diagonal approximation errors in the inverse Hessian of a QN method, slowing down its convergence. To address this problem, we propose to decouple QN updates using a coroutine while batching the acquisition function calls. Our approach not only yields the theoretically identical convergence to the sequential MSO but also drastically reduces the wall-clock time compared to the previous approaches. Our approach is available in GPSampler in Optuna, effectively reducing its computational overhead.
