Gradient-based Learning in State-based Potential Games for Self-Learning Production Systems
Steve Yuwono, Marlon Löppenberg, Dorothea Schwung, Andreas Schwung
TL;DR
The paper tackles slow, exploration-heavy training in state-based potential games (SbPGs) for distributed production systems by introducing gradient-based learning methods. It develops three gradient-estimation variants—basic Newton divided differences, momentum-augmented, and polynomial interpolation—to handle unknown/nonconvex utilities, and adds a kick-off random exploration phase to accelerate initial learning. Validating on the Bulk Good Laboratory Plant (BGLP), the approach achieves comparable or better global objectives with significantly reduced training times and lower energy consumption than best response learning. The results demonstrate that gradient-guided exploration can provide faster convergence and smoother policy updates, with practical implications for fast, scalable optimization in smart manufacturing. Future work includes extending gradient-based SbPGs to model-based settings and other game structures such as Stackelberg games, broadening their applicability in industrial distributed optimization.
Abstract
In this paper, we introduce novel gradient-based optimization methods for state-based potential games (SbPGs) within self-learning distributed production systems. SbPGs are recognised for their efficacy in enabling self-optimizing distributed multi-agent systems and offer a proven convergence guarantee, which facilitates collaborative player efforts towards global objectives. Our study strives to replace conventional ad-hoc random exploration-based learning in SbPGs with contemporary gradient-based approaches, which aim for faster convergence and smoother exploration dynamics, thereby shortening training duration while upholding the efficacy of SbPGs. Moreover, we propose three distinct variants for estimating the objective function of gradient-based learning, each developed to suit the unique characteristics of the systems under consideration. To validate our methodology, we apply it to a laboratory testbed, namely Bulk Good Laboratory Plant, which represents a smart and flexible distributed multi-agent production system. The incorporation of gradient-based learning in SbPGs reduces training times and achieves more optimal policies than its baseline.
