Table of Contents
Fetching ...

A Minibatch-SGD-Based Learning Meta-Policy for Inventory Systems with Myopic Optimal Policy

Jiameng Lyu, Jinxing Xie, Shilin Yuan, Yuan Zhou

TL;DR

This paper proposes a novel minibatch-SGD-based meta-policy that is flexible enough to be applied to a general inventory systems framework covering a wide range of inventory management problems with myopic clairvoyant optimal policy and achieves a regret bound of $\mathcal{O}(\sqrt{T})$ for the general convex case and $\mathcal{O}(\log T)$ for the strongly convex case.

Abstract

Stochastic gradient descent (SGD) has proven effective in solving many inventory control problems with demand learning. However, it often faces the pitfall of an infeasible target inventory level that is lower than the current inventory level. Several recent works (e.g., Huh and Rusmevichientong (2009), Shi et al.(2016)) are successful to resolve this issue in various inventory systems. However, their techniques are rather sophisticated and difficult to be applied to more complicated scenarios such as multi-product and multi-constraint inventory systems. In this paper, we address the infeasible-target-inventory-level issue from a new technical perspective -- we propose a novel minibatch-SGD-based meta-policy. Our meta-policy is flexible enough to be applied to a general inventory systems framework covering a wide range of inventory management problems with myopic clairvoyant optimal policy. By devising the optimal minibatch scheme, our meta-policy achieves a regret bound of $\mathcal{O}(\sqrt{T})$ for the general convex case and $\mathcal{O}(\log T)$ for the strongly convex case. To demonstrate the power and flexibility of our meta-policy, we apply it to three important inventory control problems: multi-product and multi-constraint systems, multi-echelon serial systems, and one-warehouse and multi-store systems by carefully designing application-specific subroutines.We also conduct extensive numerical experiments to demonstrate that our meta-policy enjoys competitive regret performance, high computational efficiency, and low variances among a wide range of applications.

A Minibatch-SGD-Based Learning Meta-Policy for Inventory Systems with Myopic Optimal Policy

TL;DR

This paper proposes a novel minibatch-SGD-based meta-policy that is flexible enough to be applied to a general inventory systems framework covering a wide range of inventory management problems with myopic clairvoyant optimal policy and achieves a regret bound of for the general convex case and for the strongly convex case.

Abstract

Stochastic gradient descent (SGD) has proven effective in solving many inventory control problems with demand learning. However, it often faces the pitfall of an infeasible target inventory level that is lower than the current inventory level. Several recent works (e.g., Huh and Rusmevichientong (2009), Shi et al.(2016)) are successful to resolve this issue in various inventory systems. However, their techniques are rather sophisticated and difficult to be applied to more complicated scenarios such as multi-product and multi-constraint inventory systems. In this paper, we address the infeasible-target-inventory-level issue from a new technical perspective -- we propose a novel minibatch-SGD-based meta-policy. Our meta-policy is flexible enough to be applied to a general inventory systems framework covering a wide range of inventory management problems with myopic clairvoyant optimal policy. By devising the optimal minibatch scheme, our meta-policy achieves a regret bound of for the general convex case and for the strongly convex case. To demonstrate the power and flexibility of our meta-policy, we apply it to three important inventory control problems: multi-product and multi-constraint systems, multi-echelon serial systems, and one-warehouse and multi-store systems by carefully designing application-specific subroutines.We also conduct extensive numerical experiments to demonstrate that our meta-policy enjoys competitive regret performance, high computational efficiency, and low variances among a wide range of applications.
Paper Structure (52 sections, 28 theorems, 149 equations, 11 figures, 3 tables, 4 algorithms)

This paper contains 52 sections, 28 theorems, 149 equations, 11 figures, 3 tables, 4 algorithms.

Key Result

Lemma 1

Suppose that $F({\bm w})= \mathbb{E} [f({\bm w};\xi)]$ is convex and $\beta$-smooth satisfying $\max_{{\bm w}\in\Gamma}\|\nabla F({\bm w})\|_2\leq G$ and the bounded variance property $\mathbb{E} [\|\nabla f({\bm w};\xi)-\nabla F({\bm w})\|_2^2]\leq \sigma^2$, and $\Gamma$ be a bounded convex set sa

Figures (11)

  • Figure 1: A multi-echelon serial inventory system with $n$ stages. Parameters $h_i$ and $b_i$ will be specified in Section \ref{['app2:problem formulation']}.
  • Figure 2: Inventory system with one warehouse & $n$ stores. Parameters $h_i,$$c_i$ and $b_i$ are specified in Section \ref{['app3:problem formulation']}.
  • Figure EC.1: NVP: Comparision with Different Algorithms under Different Distributions
  • Figure EC.2: NVP: Comparision with Different Algorithms under Different $b$ (Left Panel: $b=5$, Right Panel: $b = 25$)
  • Figure EC.3: NVP: Variance of Regret and Distance
  • ...and 6 more figures

Theorems & Definitions (36)

  • Definition 1: Well-behaved Gradient Estimator
  • Definition 2: Transition Solver
  • Lemma 1
  • Remark 1
  • Lemma 2
  • Remark 2
  • Theorem 1
  • Theorem 2
  • Remark 3
  • Remark 4
  • ...and 26 more