Table of Contents
Fetching ...

Solving the Paint Shop Problem with Flexible Management of Multi-Lane Buffers Using Reinforcement Learning and Action Masking

Mirko Stappert, Bernhard Lutz, Janis Brammer, Dirk Neumann

TL;DR

The paper tackles the paint shop problem with flexible store/retrieve decisions in multi-lane buffers to minimize color changes. It provides an ILP formalization of the fully flexible variant and proves that greedy retrieval is optimal, establishing a theoretical advantage over store-then-retrieve variants. A reinforcement learning approach using PPO is developed, employing action masking and a one-hot state representation with lookahead ($K=5$) to learn policies for both storing and retrieving actions. Extensive experiments on 170 instances across varied buffer shapes, color distributions, and initialization conditions show substantial reductions in color changes and robust performance, with practical guidelines for applying masking, stochastic policy evaluation, and generalization to out-of-distribution scenarios.

Abstract

In the paint shop problem, an unordered incoming sequence of cars assigned to different colors has to be reshuffled with the objective of minimizing the number of color changes. To reshuffle the incoming sequence, manufacturers can employ a first-in-first-out multi-lane buffer system allowing store and retrieve operations. So far, prior studies primarily focused on simple decision heuristics like greedy or simplified problem variants that do not allow full flexibility when performing store and retrieve operations. In this study, we propose a reinforcement learning approach to minimize color changes for the flexible problem variant, where store and retrieve operations can be performed in an arbitrary order. After proving that greedy retrieval is optimal, we incorporate this finding into the model using action masking. Our evaluation, based on 170 problem instances with 2-8 buffer lanes and 5-15 colors, shows that our approach reduces color changes compared to existing methods by considerable margins depending on the problem size. Furthermore, we demonstrate the robustness of our approach towards different buffer sizes and imbalanced color distributions.

Solving the Paint Shop Problem with Flexible Management of Multi-Lane Buffers Using Reinforcement Learning and Action Masking

TL;DR

The paper tackles the paint shop problem with flexible store/retrieve decisions in multi-lane buffers to minimize color changes. It provides an ILP formalization of the fully flexible variant and proves that greedy retrieval is optimal, establishing a theoretical advantage over store-then-retrieve variants. A reinforcement learning approach using PPO is developed, employing action masking and a one-hot state representation with lookahead () to learn policies for both storing and retrieving actions. Extensive experiments on 170 instances across varied buffer shapes, color distributions, and initialization conditions show substantial reductions in color changes and robust performance, with practical guidelines for applying masking, stochastic policy evaluation, and generalization to out-of-distribution scenarios.

Abstract

In the paint shop problem, an unordered incoming sequence of cars assigned to different colors has to be reshuffled with the objective of minimizing the number of color changes. To reshuffle the incoming sequence, manufacturers can employ a first-in-first-out multi-lane buffer system allowing store and retrieve operations. So far, prior studies primarily focused on simple decision heuristics like greedy or simplified problem variants that do not allow full flexibility when performing store and retrieve operations. In this study, we propose a reinforcement learning approach to minimize color changes for the flexible problem variant, where store and retrieve operations can be performed in an arbitrary order. After proving that greedy retrieval is optimal, we incorporate this finding into the model using action masking. Our evaluation, based on 170 problem instances with 2-8 buffer lanes and 5-15 colors, shows that our approach reduces color changes compared to existing methods by considerable margins depending on the problem size. Furthermore, we demonstrate the robustness of our approach towards different buffer sizes and imbalanced color distributions.

Paper Structure

This paper contains 25 sections, 3 theorems, 22 equations, 5 figures, 9 tables, 2 algorithms.

Key Result

Theorem 1

Consider the paint shop problem with more than one buffer lane $L > 1$, more than one color $C>1$, and lane width $W > 1$. For every number $n\in \mathbb{N}$ there is an upstream sequence for which "store-then-retrieve" causes at least $n$ more color changes than "flexible storage and retrieval."

Figures (5)

  • Figure 1: Paint shop problem with a 4x5 buffer ($L=4$ lanes of width $W=5$) and corresponding notation. The binary decision variables are given by $x_{t,i}$ (storage) and $y_{t,i}$ (retrieval).
  • Figure 2: Illustration of the four considered action masks.
  • Figure 3: Evaluation results (color changes) of main analysis for best RL approach and baselines.
  • Figure 4: Evaluation results (color changes) of main analysis for different RL policy applications.
  • Figure 5: Comparison of the longest exclusive sequences of storage or retrieval operations between RL, simulated annealing, and Gurobi based on "store-then-retrieve" wu2021mathematical. Each plot shows the maximum exclusive sequence lengths of $n=10$ instances with 15 colors. The first and last sub-sequence was excluded in the calculation of the maximum to remove initial filling and final retrieval.

Theorems & Definitions (4)

  • Example 1
  • Theorem 1
  • Theorem 2
  • Corollary 2.1