Table of Contents
Fetching ...

Leveraging Graph Neural Networks and Multi-Agent Reinforcement Learning for Inventory Control in Supply Chains

Niki Kotecha, Antonio del Rio Chanona

TL;DR

The paper addresses inventory control in complex, uncertain supply chains by proposing a graph-based multi-agent reinforcement learning framework that uses MAPPO with a parameterized (s,S) inventory policy. By representing the supply chain as a graph and applying three-layer GCNs with global mean pooling, the approach enables coordinated decisions under limited information sharing while maintaining decentralized execution. It introduces a regularized Reg-P-GCN-MAPPO variant that injects Gaussian noise into the value function to improve exploration and reduce overfitting, and validates the methods across four configurations, showing robust profits and scalable performance, especially in larger agent populations. The work advances practical, adaptive inventory management in decentralized, graph-structured environments and provides code for reproduction, highlighting the benefits of applying structure-aware MARL to real-world supply chains.

Abstract

Inventory control in modern supply chains has attracted significant attention due to the increasing number of disruptive shocks and the challenges posed by complex dynamics, uncertainties, and limited collaboration. Traditional methods, which often rely on static parameters, struggle to adapt to changing environments. This paper proposes a Multi-Agent Reinforcement Learning (MARL) framework with Graph Neural Networks (GNNs) for state representation to address these limitations. Our approach redefines the action space by parameterizing heuristic inventory control policies, making it adaptive as the parameters dynamically adjust based on system conditions. By leveraging the inherent graph structure of supply chains, our framework enables agents to learn the system's topology, and we employ a centralized learning, decentralized execution scheme that allows agents to learn collaboratively while overcoming information-sharing constraints. Additionally, we incorporate global mean pooling and regularization techniques to enhance performance. We test the capabilities of our proposed approach on four different supply chain configurations and conduct a sensitivity analysis. This work paves the way for utilizing MARL-GNN frameworks to improve inventory management in complex, decentralized supply chain environments.

Leveraging Graph Neural Networks and Multi-Agent Reinforcement Learning for Inventory Control in Supply Chains

TL;DR

The paper addresses inventory control in complex, uncertain supply chains by proposing a graph-based multi-agent reinforcement learning framework that uses MAPPO with a parameterized (s,S) inventory policy. By representing the supply chain as a graph and applying three-layer GCNs with global mean pooling, the approach enables coordinated decisions under limited information sharing while maintaining decentralized execution. It introduces a regularized Reg-P-GCN-MAPPO variant that injects Gaussian noise into the value function to improve exploration and reduce overfitting, and validates the methods across four configurations, showing robust profits and scalable performance, especially in larger agent populations. The work advances practical, adaptive inventory management in decentralized, graph-structured environments and provides code for reproduction, highlighting the benefits of applying structure-aware MARL to real-world supply chains.

Abstract

Inventory control in modern supply chains has attracted significant attention due to the increasing number of disruptive shocks and the challenges posed by complex dynamics, uncertainties, and limited collaboration. Traditional methods, which often rely on static parameters, struggle to adapt to changing environments. This paper proposes a Multi-Agent Reinforcement Learning (MARL) framework with Graph Neural Networks (GNNs) for state representation to address these limitations. Our approach redefines the action space by parameterizing heuristic inventory control policies, making it adaptive as the parameters dynamically adjust based on system conditions. By leveraging the inherent graph structure of supply chains, our framework enables agents to learn the system's topology, and we employ a centralized learning, decentralized execution scheme that allows agents to learn collaboratively while overcoming information-sharing constraints. Additionally, we incorporate global mean pooling and regularization techniques to enhance performance. We test the capabilities of our proposed approach on four different supply chain configurations and conduct a sensitivity analysis. This work paves the way for utilizing MARL-GNN frameworks to improve inventory management in complex, decentralized supply chain environments.

Paper Structure

This paper contains 23 sections, 16 equations, 17 figures, 7 tables.

Figures (17)

  • Figure 1: Multi-Agent Reinforcement Learning with Graph Neural Networks for Inventory Management – A decentralized policy learning approach where each warehouse optimizes its inventory decisions using independent neural network policies $\pi_1, \pi_2, \pi_3$ while a Graph Neural Network (GNN) captures spatial dependencies between them, enabling coordinated decision-making across the supply chain.
  • Figure 2: Schematic showcasing the Centralized Training Decentralized Execution framework, demonstrating how agents undergo collaborative training in a centralized manner (blue dashed line) while executing actions independently in their respective environments (red dashed line).
  • Figure 3: Schematic showing the inventory flow between two nodes in an inventory management system
  • Figure 4: Neural network architecture for actor, illustrating sampling from a Gaussian distribution, followed by a post-processing step, leveraging an inventory heuristic policy to generate actions in a continuous action space.
  • Figure 5: Illustration of the Graph Convolutional Network (GCN) architecture, representing the Graph Module of our framework. The GCN takes the adjacency matrix and node feature matrix as input, processes them through three convolutional layers, each with ReLU activation, and outputs the embedded node matrix representing the learned representations of the nodes.
  • ...and 12 more figures