Table of Contents
Fetching ...

Smaller Batches, Bigger Gains? Investigating the Impact of Batch Sizes on Reinforcement Learning Based Real-World Production Scheduling

Arthur Müller, Felix Grumbach, Matthia Sabatelli

TL;DR

This work analyzes how batch size $b$ affects reinforcement learning-based production scheduling for a two-stage PFSSP with a central buffer. Building on prior work that used a fixed $b=50$ and PPO, it evaluates a range of batch sizes, introduces curricula to enable training with small batches, and identifies a sweet spot near $b\approx 70$ that minimizes sample complexity while noting that smaller batches can reduce setup times at the cost of longer training due to an expanded policy space. It shows that extreme batch sizes underperform and that policy drift can occur with very small batches, which motivates Curriculum B and C to stabilize learning and improve setup-effort performance, albeit with increased training time and data dependence. The findings provide practical guidelines for batch-size selection in real-world PFSSP-like problems and offer transferable curriculum strategies for similar scheduling tasks.

Abstract

Production scheduling is an essential task in manufacturing, with Reinforcement Learning (RL) emerging as a key solution. In a previous work, RL was utilized to solve an extended permutation flow shop scheduling problem (PFSSP) for a real-world production line with two stages, linked by a central buffer. The RL agent was trained to sequence equallysized product batches to minimize setup efforts and idle times. However, the substantial impact caused by varying the size of these product batches has not yet been explored. In this follow-up study, we investigate the effects of varying batch sizes, exploring both the quality of solutions and the training dynamics of the RL agent. The results demonstrate that it is possible to methodically identify reasonable boundaries for the batch size. These boundaries are determined on one side by the increasing sample complexity associated with smaller batch sizes, and on the other side by the decreasing flexibility of the agent when dealing with larger batch sizes. This provides the practitioner the ability to make an informed decision regarding the selection of an appropriate batch size. Moreover, we introduce and investigate two new curriculum learning strategies to enable the training with small batch sizes. The findings of this work offer the potential for application in several industrial use cases with comparable scheduling problems.

Smaller Batches, Bigger Gains? Investigating the Impact of Batch Sizes on Reinforcement Learning Based Real-World Production Scheduling

TL;DR

This work analyzes how batch size affects reinforcement learning-based production scheduling for a two-stage PFSSP with a central buffer. Building on prior work that used a fixed and PPO, it evaluates a range of batch sizes, introduces curricula to enable training with small batches, and identifies a sweet spot near that minimizes sample complexity while noting that smaller batches can reduce setup times at the cost of longer training due to an expanded policy space. It shows that extreme batch sizes underperform and that policy drift can occur with very small batches, which motivates Curriculum B and C to stabilize learning and improve setup-effort performance, albeit with increased training time and data dependence. The findings provide practical guidelines for batch-size selection in real-world PFSSP-like problems and offer transferable curriculum strategies for similar scheduling tasks.

Abstract

Production scheduling is an essential task in manufacturing, with Reinforcement Learning (RL) emerging as a key solution. In a previous work, RL was utilized to solve an extended permutation flow shop scheduling problem (PFSSP) for a real-world production line with two stages, linked by a central buffer. The RL agent was trained to sequence equallysized product batches to minimize setup efforts and idle times. However, the substantial impact caused by varying the size of these product batches has not yet been explored. In this follow-up study, we investigate the effects of varying batch sizes, exploring both the quality of solutions and the training dynamics of the RL agent. The results demonstrate that it is possible to methodically identify reasonable boundaries for the batch size. These boundaries are determined on one side by the increasing sample complexity associated with smaller batch sizes, and on the other side by the decreasing flexibility of the agent when dealing with larger batch sizes. This provides the practitioner the ability to make an informed decision regarding the selection of an appropriate batch size. Moreover, we introduce and investigate two new curriculum learning strategies to enable the training with small batch sizes. The findings of this work offer the potential for application in several industrial use cases with comparable scheduling problems.
Paper Structure (16 sections, 6 equations, 6 figures, 4 tables)

This paper contains 16 sections, 6 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Model of the production system within a discrete-event simulator (adapted from Müller et. al. Mueller2024).
  • Figure 2: Time steps needed to finish the training. Two separate effects cause an increase: As the batch size decreases, the sample complexity increases due to the increasing policy space. As the batch size increases, the sample complexity increases due to the decreasing flexibility of the agent, which makes it harder to find good solutions. The minimum is at $b=70$.
  • Figure 3: $|\mathcal{A}|^T$ over batch sizes for different average number of actions per step. $|\mathcal{A}|=8$, $T=300$ for $b=50$ and a planning horizon of one week. The y-axis is presented on a logarithmic scale. Due to limited computational capability, we could not perform calculations for values smaller than $b=40$.
  • Figure 4: Setup efforts for batch sizes. Considering only zero-idle-time solutions.
  • Figure 5: Learning curve for a batch size of 10 (BS10 agent) and 20 (BS20 agent). The BS10 agent is drifting away from the intended behavior when entering task 3.
  • ...and 1 more figures