Modeling Local Search Metaheuristics Using Markov Decision Processes
Rubén Ruiz-Torrubiano
TL;DR
This work addresses the challenge of understanding and selecting among local search metaheuristics by introducing an MDP-based framework that models local search as policies on a discrete-time, infinite-horizon MDP with explicit rewards. It defines convergence and exploration-exploitation measures, notably the convergence coefficient $\gamma_i^A(t)$ and the exploration-exploitation coefficient $\delta_i^A$, and proves a local search exploration-exploitation theorem. Applying the framework to hill climbing and simulated annealing shows that hill climbing is exploitation-oriented while SA is balanced, aligning with established intuition. The approach provides a theory-grounded basis for choosing metaheuristics for a given problem and can be extended to other local search and population-based methods.
Abstract
Local search metaheuristics like tabu search or simulated annealing are popular heuristic optimization algorithms for finding near-optimal solutions for combinatorial optimization problems. However, it is still challenging for researchers and practitioners to analyze their behaviour and systematically choose one over a vast set of possible metaheuristics for the particular problem at hand. In this paper, we introduce a theoretical framework based on Markov Decision Processes (MDP) for analyzing local search metaheuristics. This framework not only helps in providing convergence results for individual algorithms, but also provides an explicit characterization of the exploration-exploitation tradeoff and a theory-grounded guidance for practitioners for choosing an appropriate metaheuristic for the problem at hand. We present this framework in detail and show how to apply it in the case of hill climbing and the simulated annealing algorithm.
