Table of Contents
Fetching ...

Dual-Agent Deep Reinforcement Learning for Dynamic Pricing and Replenishment

Yi Zheng, Zehao Li, Peng Jiang, Yijie Peng

TL;DR

This work addresses the discrepancies in decision frequencies between pricing and replenishment, ensuring convergence to local optimum, by employing a two-timescale stochastic approximation scheme and proposing a fast-slow dual-agent DRL algorithm.

Abstract

We study the dynamic pricing and replenishment problems under inconsistent decision frequencies. Different from the traditional demand assumption, the discreteness of demand and the parameter within the Poisson distribution as a function of price introduce complexity into analyzing the problem property. We demonstrate the concavity of the single-period profit function with respect to product price and inventory within their respective domains. The demand model is enhanced by integrating a decision tree-based machine learning approach, trained on comprehensive market data. Employing a two-timescale stochastic approximation scheme, we address the discrepancies in decision frequencies between pricing and replenishment, ensuring convergence to local optimum. We further refine our methodology by incorporating deep reinforcement learning (DRL) techniques and propose a fast-slow dual-agent DRL algorithm. In this approach, two agents handle pricing and inventory and are updated on different scales. Numerical results from both single and multiple products scenarios validate the effectiveness of our methods.

Dual-Agent Deep Reinforcement Learning for Dynamic Pricing and Replenishment

TL;DR

This work addresses the discrepancies in decision frequencies between pricing and replenishment, ensuring convergence to local optimum, by employing a two-timescale stochastic approximation scheme and proposing a fast-slow dual-agent DRL algorithm.

Abstract

We study the dynamic pricing and replenishment problems under inconsistent decision frequencies. Different from the traditional demand assumption, the discreteness of demand and the parameter within the Poisson distribution as a function of price introduce complexity into analyzing the problem property. We demonstrate the concavity of the single-period profit function with respect to product price and inventory within their respective domains. The demand model is enhanced by integrating a decision tree-based machine learning approach, trained on comprehensive market data. Employing a two-timescale stochastic approximation scheme, we address the discrepancies in decision frequencies between pricing and replenishment, ensuring convergence to local optimum. We further refine our methodology by incorporating deep reinforcement learning (DRL) techniques and propose a fast-slow dual-agent DRL algorithm. In this approach, two agents handle pricing and inventory and are updated on different scales. Numerical results from both single and multiple products scenarios validate the effectiveness of our methods.

Paper Structure

This paper contains 29 sections, 16 theorems, 66 equations, 12 figures, 5 tables, 1 algorithm.

Key Result

Proposition 1

The Poisson distribution $D(p) \sim Pois(\lambda(p))$, where $\lambda(p)$ is the parameter, does not fall under the traditional demand assumption described in Eq.(eq:new10): $d(p,\epsilon) = \gamma(p)\epsilon + \delta(p)$, with $\epsilon$ being a random variable that is independent of $p$.

Figures (12)

  • Figure 1: Poisson distribution and its truncated version
  • Figure 2: Eq.(\ref{['eq:13']}) and the second term of Eq.(\ref{['eq:11']}) without quadratic term with $\lambda=5$
  • Figure 3: Structure of actor network
  • Figure 4: Comparison of reward under different scenarios
  • Figure 5: Pricing and beginning inventory under different policies
  • ...and 7 more figures

Theorems & Definitions (35)

  • Proposition 1
  • Lemma 1
  • Proposition 2
  • proof
  • Remark 1
  • Proposition 3
  • Lemma 2
  • Remark 2
  • Theorem 1
  • Proposition 4
  • ...and 25 more