An efficient implementation of parallel simulated annealing algorithm in GPUs
A. M. Ferreiro, J. A. García, J. G. López-Salas, C. Vázquez
TL;DR
This work tackles the high-cost problem of box-constrained global minimization by delivering a highly optimized GPU implementation of parallel simulated annealing in CUDA. It systematically compares sequential SA, asynchronous parallel SA, and a synchronous parallel SA with per-temperature exchanges, demonstrating that the synchronous variant offers superior convergence and computational efficiency in practice. The study combines a Schwefel-based benchmark and a broad test-suite of 41 functions, analyzes precision effects, memory-boundedness, and the impact of thread counts and Markov-chain length, and also explores a hybrid SA/Nelder–Mead approach. The findings underscore the practical potential of GPU-accelerated SA for large-scale optimization in fields spanning biology, physics, engineering, and finance, with open-source plans to extend accessibility and reuse.
Abstract
In this work we propose a highly optimized version of a simulated annealing (SA) algorithm adapted to the more recently developed Graphic Processor Units (GPUs). The programming has been carried out with CUDA toolkit, specially designed for Nvidia GPUs. For this purpose, efficient versions of SA have been first analyzed and adapted to GPUs. Thus, an appropriate sequential SA algorithm has been developed as a starting point. Next, a straightforward asynchronous parallel version has been implemented and then a specific and more efficient synchronous version has been developed. A wide appropriate benchmark to illustrate the performance properties of the implementation has been considered. Among all tests, a classical sample problem provided by the minimization of the normalized Schwefel function has been selected to compare the behavior of the sequential, asynchronous, and synchronous versions, the last one being more advantageous in terms of balance between convergence, accuracy, and computational cost. Also, the implementation of a hybrid method combining SA with a local minimizer method has been developed. Note that the generic feature of the SA algorithm allows its application in a wide set of real problems arising in a large variety of fields, such as biology, physics, engineering, finance, and industrial processes.
