Smaller Batches, Bigger Gains? Investigating the Impact of Batch Sizes on Reinforcement Learning Based Real-World Production Scheduling
Arthur Müller, Felix Grumbach, Matthia Sabatelli
TL;DR
This work analyzes how batch size $b$ affects reinforcement learning-based production scheduling for a two-stage PFSSP with a central buffer. Building on prior work that used a fixed $b=50$ and PPO, it evaluates a range of batch sizes, introduces curricula to enable training with small batches, and identifies a sweet spot near $b\approx 70$ that minimizes sample complexity while noting that smaller batches can reduce setup times at the cost of longer training due to an expanded policy space. It shows that extreme batch sizes underperform and that policy drift can occur with very small batches, which motivates Curriculum B and C to stabilize learning and improve setup-effort performance, albeit with increased training time and data dependence. The findings provide practical guidelines for batch-size selection in real-world PFSSP-like problems and offer transferable curriculum strategies for similar scheduling tasks.
Abstract
Production scheduling is an essential task in manufacturing, with Reinforcement Learning (RL) emerging as a key solution. In a previous work, RL was utilized to solve an extended permutation flow shop scheduling problem (PFSSP) for a real-world production line with two stages, linked by a central buffer. The RL agent was trained to sequence equallysized product batches to minimize setup efforts and idle times. However, the substantial impact caused by varying the size of these product batches has not yet been explored. In this follow-up study, we investigate the effects of varying batch sizes, exploring both the quality of solutions and the training dynamics of the RL agent. The results demonstrate that it is possible to methodically identify reasonable boundaries for the batch size. These boundaries are determined on one side by the increasing sample complexity associated with smaller batch sizes, and on the other side by the decreasing flexibility of the agent when dealing with larger batch sizes. This provides the practitioner the ability to make an informed decision regarding the selection of an appropriate batch size. Moreover, we introduce and investigate two new curriculum learning strategies to enable the training with small batch sizes. The findings of this work offer the potential for application in several industrial use cases with comparable scheduling problems.
