Solving the Paint Shop Problem with Flexible Management of Multi-Lane Buffers Using Reinforcement Learning and Action Masking
Mirko Stappert, Bernhard Lutz, Janis Brammer, Dirk Neumann
TL;DR
The paper tackles the paint shop problem with flexible store/retrieve decisions in multi-lane buffers to minimize color changes. It provides an ILP formalization of the fully flexible variant and proves that greedy retrieval is optimal, establishing a theoretical advantage over store-then-retrieve variants. A reinforcement learning approach using PPO is developed, employing action masking and a one-hot state representation with lookahead ($K=5$) to learn policies for both storing and retrieving actions. Extensive experiments on 170 instances across varied buffer shapes, color distributions, and initialization conditions show substantial reductions in color changes and robust performance, with practical guidelines for applying masking, stochastic policy evaluation, and generalization to out-of-distribution scenarios.
Abstract
In the paint shop problem, an unordered incoming sequence of cars assigned to different colors has to be reshuffled with the objective of minimizing the number of color changes. To reshuffle the incoming sequence, manufacturers can employ a first-in-first-out multi-lane buffer system allowing store and retrieve operations. So far, prior studies primarily focused on simple decision heuristics like greedy or simplified problem variants that do not allow full flexibility when performing store and retrieve operations. In this study, we propose a reinforcement learning approach to minimize color changes for the flexible problem variant, where store and retrieve operations can be performed in an arbitrary order. After proving that greedy retrieval is optimal, we incorporate this finding into the model using action masking. Our evaluation, based on 170 problem instances with 2-8 buffer lanes and 5-15 colors, shows that our approach reduces color changes compared to existing methods by considerable margins depending on the problem size. Furthermore, we demonstrate the robustness of our approach towards different buffer sizes and imbalanced color distributions.
