Gradient-based Learning in State-based Potential Games for Self-Learning Production Systems

Steve Yuwono; Marlon Löppenberg; Dorothea Schwung; Andreas Schwung

Gradient-based Learning in State-based Potential Games for Self-Learning Production Systems

Steve Yuwono, Marlon Löppenberg, Dorothea Schwung, Andreas Schwung

TL;DR

The paper tackles slow, exploration-heavy training in state-based potential games (SbPGs) for distributed production systems by introducing gradient-based learning methods. It develops three gradient-estimation variants—basic Newton divided differences, momentum-augmented, and polynomial interpolation—to handle unknown/nonconvex utilities, and adds a kick-off random exploration phase to accelerate initial learning. Validating on the Bulk Good Laboratory Plant (BGLP), the approach achieves comparable or better global objectives with significantly reduced training times and lower energy consumption than best response learning. The results demonstrate that gradient-guided exploration can provide faster convergence and smoother policy updates, with practical implications for fast, scalable optimization in smart manufacturing. Future work includes extending gradient-based SbPGs to model-based settings and other game structures such as Stackelberg games, broadening their applicability in industrial distributed optimization.

Abstract

In this paper, we introduce novel gradient-based optimization methods for state-based potential games (SbPGs) within self-learning distributed production systems. SbPGs are recognised for their efficacy in enabling self-optimizing distributed multi-agent systems and offer a proven convergence guarantee, which facilitates collaborative player efforts towards global objectives. Our study strives to replace conventional ad-hoc random exploration-based learning in SbPGs with contemporary gradient-based approaches, which aim for faster convergence and smoother exploration dynamics, thereby shortening training duration while upholding the efficacy of SbPGs. Moreover, we propose three distinct variants for estimating the objective function of gradient-based learning, each developed to suit the unique characteristics of the systems under consideration. To validate our methodology, we apply it to a laboratory testbed, namely Bulk Good Laboratory Plant, which represents a smart and flexible distributed multi-agent production system. The incorporation of gradient-based learning in SbPGs reduces training times and achieves more optimal policies than its baseline.

Gradient-based Learning in State-based Potential Games for Self-Learning Production Systems

TL;DR

Abstract

Paper Structure (17 sections, 15 equations, 6 figures, 1 table)

This paper contains 17 sections, 15 equations, 6 figures, 1 table.

Introduction
Preliminary Research
Game Theory for Distributed Optimization
Gradient-based Optimization
Fundamentals of State-based Potential Games and Best Response Learning
State-based Potential Games
Best Response Learning
Gradient-based Learning in State-based Potential Games
Gradient Ascent with Newton's First Divided Difference Method
Gradient Ascent with Newton's First Divided Difference Method and Momentum
Gradient Ascent with Newton's First Divided Difference Method of Polynomial Interpolation
Kick-Off with Random Exploration
Results and Discussions
Bulk Good Laboratory Plant
Benchmark: SbPGs with Best Response Learning
...and 2 more sections

Figures (6)

Figure 1: An overview of self-learning mechanism in distributed production systems using a game structure of SbPGs.
Figure 2: Learning methods during exploration in SbPGs.
Figure 3: An example of a 5 × 5 performance map with a 2D-state space in SbPGs with gradient-based learning.
Figure 4: An illustration of kick-off procedure in SbPGs with gradient-based learning.
Figure 5: Bulk Good Laboratory Plant.
...and 1 more figures

Gradient-based Learning in State-based Potential Games for Self-Learning Production Systems

TL;DR

Abstract

Gradient-based Learning in State-based Potential Games for Self-Learning Production Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (6)