Dynamic operator management in meta-heuristics using reinforcement learning: an application to permutation flowshop scheduling problems

Maryam Karimi Mamaghan; Mehrdad Mohammadi; Wout Dullaert; Daniele Vigo; Amir Pirayesh

Dynamic operator management in meta-heuristics using reinforcement learning: an application to permutation flowshop scheduling problems

Maryam Karimi Mamaghan, Mehrdad Mohammadi, Wout Dullaert, Daniele Vigo, Amir Pirayesh

TL;DR

This work tackles operator management in meta-heuristics by introducing a dynamic portfolio that adapts which perturbation operators are available during search and a Q-learning-based adaptive operator selection to pick the most promising operator from the portfolio. The framework, termed Dynamic QIG (DQIG), integrates tabu-inspired portfolio updates to suppress ineffective operators and a $Q(s,a)$-driven policy with an $\epsilon$-greedy strategy to balance exploration and exploitation, all applied to the PFSP with IG as the base algorithm. Empirical results on the PFSP benchmarks (Taillard and VRF-hard-large) show that DQIG significantly outperforms static portfolio and several state-of-the-art IG variants in both optimality gap and convergence speed, often achieving negative relative deviation on challenging instances. The approach maintains a computational overhead that does not grow with instance size, thanks to the constant-time per-iteration components, and demonstrates robust performance across diverse PFSP instances, underscoring the practical impact of dynamic operator management in complex COPs. The framework paves the way for more data-driven, adjustable search strategies that can operate effectively without expert-tuned operator sets.

Abstract

This study develops a framework based on reinforcement learning to dynamically manage a large portfolio of search operators within meta-heuristics. Using the idea of tabu search, the framework allows for continuous adaptation by temporarily excluding less efficient operators and updating the portfolio composition during the search. A Q-learning-based adaptive operator selection mechanism is used to select the most suitable operator from the dynamically updated portfolio at each stage. Unlike traditional approaches, the proposed framework requires no input from the experts regarding the search operators, allowing domain-specific non-experts to effectively use the framework. The performance of the proposed framework is analyzed through an application to the permutation flowshop scheduling problem. The results demonstrate the superior performance of the proposed framework against state-of-the-art algorithms in terms of optimality gap and convergence speed.

Dynamic operator management in meta-heuristics using reinforcement learning: an application to permutation flowshop scheduling problems

TL;DR

-driven policy with an

-greedy strategy to balance exploration and exploitation, all applied to the PFSP with IG as the base algorithm. Empirical results on the PFSP benchmarks (Taillard and VRF-hard-large) show that DQIG significantly outperforms static portfolio and several state-of-the-art IG variants in both optimality gap and convergence speed, often achieving negative relative deviation on challenging instances. The approach maintains a computational overhead that does not grow with instance size, thanks to the constant-time per-iteration components, and demonstrates robust performance across diverse PFSP instances, underscoring the practical impact of dynamic operator management in complex COPs. The framework paves the way for more data-driven, adjustable search strategies that can operate effectively without expert-tuned operator sets.

Abstract

Paper Structure (23 sections, 13 equations, 4 figures, 6 algorithms)

This paper contains 23 sections, 13 equations, 4 figures, 6 algorithms.

Introduction
Contributions of this paper
Application of the proposed framework
Background
Permutation flowshop scheduling problem
Iterated greedy (IG) algorithm
Reinforcement learning and Q-learning
The proposed operator management framework
Dynamic portfolio determination
Adaptive operator selection
Proposed operator management framework for PFSP
Complexity of the proposed DQIG framework
Experimental design
Experimental setting
Performance metrics
...and 8 more sections

Figures (4)

Figure 1: The procedure of the proposed operator management framework
Figure 2: Boxplot of DQIG and its variants based on RPD (%) for scale $t=120$ for each instance set of Taillard dataset. The mean of each algorithm is shown using red bullets, and the median value is given on the right side of each boxplot.
Figure 3: Boxplot of DQIG and the state-of-the-art algorithms based on RPD (%) for scale $t=120$ for each instance set of Taillard dataset
Figure 4: Boxplot of DQIG and the state-of-the-art algorithms based on normalized objective value over all instances of each dataset

Dynamic operator management in meta-heuristics using reinforcement learning: an application to permutation flowshop scheduling problems

TL;DR

Abstract

Dynamic operator management in meta-heuristics using reinforcement learning: an application to permutation flowshop scheduling problems

Authors

TL;DR

Abstract

Table of Contents

Figures (4)