Table of Contents
Fetching ...

A Deep Q-Network Based on Radial Basis Functions for Multi-Echelon Inventory Management

Liqiang Cheng, Jun Luo, Weiwei Fan, Yidong Zhang, Yuan Li

TL;DR

The paper tackles dynamic ordering in complex multi-echelon inventory networks and introduces a radial basis function–based deep Q-network (RBF-DQN) to reduce neural-network design complexity and hyperparameter tuning. The method uses a three-layer RBF network where hidden neurons correspond to lattice-state points and kernel activations measure state similarity, enabling efficient Q-function approximation with reduced architectural tuning. Empirical results show near-optimal performance in a serial one-warehouse-one-retailer setup and clear superiority over the base-stock policy in multi-echelon scenarios, with competitive or better performance than existing DRL approaches like neuro-dynamic programming and A3C. The approach also demonstrates faster design and training compared to some DRL methods, highlighting practical potential for real-world inventory management problems with complex network topologies.

Abstract

This paper addresses a multi-echelon inventory management problem with a complex network topology where deriving optimal ordering decisions is difficult. Deep reinforcement learning (DRL) has recently shown potential in solving such problems, while designing the neural networks in DRL remains a challenge. In order to address this, a DRL model is developed whose Q-network is based on radial basis functions. The approach can be more easily constructed compared to classic DRL models based on neural networks, thus alleviating the computational burden of hyperparameter tuning. Through a series of simulation experiments, the superior performance of this approach is demonstrated compared to the simple base-stock policy, producing a better policy in the multi-echelon system and competitive performance in the serial system where the base-stock policy is optimal. In addition, the approach outperforms current DRL approaches.

A Deep Q-Network Based on Radial Basis Functions for Multi-Echelon Inventory Management

TL;DR

The paper tackles dynamic ordering in complex multi-echelon inventory networks and introduces a radial basis function–based deep Q-network (RBF-DQN) to reduce neural-network design complexity and hyperparameter tuning. The method uses a three-layer RBF network where hidden neurons correspond to lattice-state points and kernel activations measure state similarity, enabling efficient Q-function approximation with reduced architectural tuning. Empirical results show near-optimal performance in a serial one-warehouse-one-retailer setup and clear superiority over the base-stock policy in multi-echelon scenarios, with competitive or better performance than existing DRL approaches like neuro-dynamic programming and A3C. The approach also demonstrates faster design and training compared to some DRL methods, highlighting practical potential for real-world inventory management problems with complex network topologies.

Abstract

This paper addresses a multi-echelon inventory management problem with a complex network topology where deriving optimal ordering decisions is difficult. Deep reinforcement learning (DRL) has recently shown potential in solving such problems, while designing the neural networks in DRL remains a challenge. In order to address this, a DRL model is developed whose Q-network is based on radial basis functions. The approach can be more easily constructed compared to classic DRL models based on neural networks, thus alleviating the computational burden of hyperparameter tuning. Through a series of simulation experiments, the superior performance of this approach is demonstrated compared to the simple base-stock policy, producing a better policy in the multi-echelon system and competitive performance in the serial system where the base-stock policy is optimal. In addition, the approach outperforms current DRL approaches.
Paper Structure (9 sections, 16 equations, 5 figures, 5 tables)

This paper contains 9 sections, 16 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Inventory of the multi-echelon system at time point $t$.
  • Figure 2: The procedure of multi-echelon inventory management simulation for one period between time point $t$ and $t+1$.
  • Figure 3: Structure of a deep Q-network. In RBF based deep Q-network, $\rho_i(s_t)=\left\| s_t - s_i \right\|$ is Euclidean distance and activation function $\varphi[\rho_i(s_t)]=k(\left \| s_t - s_i\right \|)$ is kernel function. While in deep Q-network constructed by other neural networks, $\rho_i(s_t)=\theta^T_i s_t +b^h_i$ is a linear transformation of its inputs $s_t$ and activation function is typically the sigmoid function or the Rectified Linear Unit (ReLU) function.
  • Figure 4: Average cost evolution during training.
  • Figure 5: Average cost evolution during training.