A Minibatch-SGD-Based Learning Meta-Policy for Inventory Systems with Myopic Optimal Policy

Jiameng Lyu; Jinxing Xie; Shilin Yuan; Yuan Zhou

A Minibatch-SGD-Based Learning Meta-Policy for Inventory Systems with Myopic Optimal Policy

Jiameng Lyu, Jinxing Xie, Shilin Yuan, Yuan Zhou

TL;DR

This paper proposes a novel minibatch-SGD-based meta-policy that is flexible enough to be applied to a general inventory systems framework covering a wide range of inventory management problems with myopic clairvoyant optimal policy and achieves a regret bound of $\mathcal{O}(\sqrt{T})$ for the general convex case and $\mathcal{O}(\log T)$ for the strongly convex case.

Abstract

Stochastic gradient descent (SGD) has proven effective in solving many inventory control problems with demand learning. However, it often faces the pitfall of an infeasible target inventory level that is lower than the current inventory level. Several recent works (e.g., Huh and Rusmevichientong (2009), Shi et al.(2016)) are successful to resolve this issue in various inventory systems. However, their techniques are rather sophisticated and difficult to be applied to more complicated scenarios such as multi-product and multi-constraint inventory systems. In this paper, we address the infeasible-target-inventory-level issue from a new technical perspective -- we propose a novel minibatch-SGD-based meta-policy. Our meta-policy is flexible enough to be applied to a general inventory systems framework covering a wide range of inventory management problems with myopic clairvoyant optimal policy. By devising the optimal minibatch scheme, our meta-policy achieves a regret bound of $\mathcal{O}(\sqrt{T})$ for the general convex case and $\mathcal{O}(\log T)$ for the strongly convex case. To demonstrate the power and flexibility of our meta-policy, we apply it to three important inventory control problems: multi-product and multi-constraint systems, multi-echelon serial systems, and one-warehouse and multi-store systems by carefully designing application-specific subroutines.We also conduct extensive numerical experiments to demonstrate that our meta-policy enjoys competitive regret performance, high computational efficiency, and low variances among a wide range of applications.

A Minibatch-SGD-Based Learning Meta-Policy for Inventory Systems with Myopic Optimal Policy

TL;DR

for the general convex case and

for the strongly convex case.

Abstract

for the general convex case and

for the strongly convex case. To demonstrate the power and flexibility of our meta-policy, we apply it to three important inventory control problems: multi-product and multi-constraint systems, multi-echelon serial systems, and one-warehouse and multi-store systems by carefully designing application-specific subroutines.We also conduct extensive numerical experiments to demonstrate that our meta-policy enjoys competitive regret performance, high computational efficiency, and low variances among a wide range of applications.

Paper Structure (52 sections, 28 theorems, 149 equations, 11 figures, 3 tables, 4 algorithms)

This paper contains 52 sections, 28 theorems, 149 equations, 11 figures, 3 tables, 4 algorithms.

Introduction
Our Contributions
Related Works
Notations
Organization
A General Framework for Inventory Systems
A Minibatch-SGD-Based Meta-Policy
Regret Analysis of the Meta-Policy
Minibatch SGD and its Regret Analysis
The Main Theorems
Application I: Multi-Product and Multi-Constraint Inventory System
Problem Formulation
Design of the Well-behaved Gradient Estimator and the Transition Solver
Regret Analysis
Application II: Multi-Echelon Serial Inventory System
...and 37 more sections

Key Result

Lemma 1

Suppose that $F({\bm w})= \mathbb{E} [f({\bm w};\xi)]$ is convex and $\beta$-smooth satisfying $\max_{{\bm w}\in\Gamma}\|\nabla F({\bm w})\|_2\leq G$ and the bounded variance property $\mathbb{E} [\|\nabla f({\bm w};\xi)-\nabla F({\bm w})\|_2^2]\leq \sigma^2$, and $\Gamma$ be a bounded convex set sa

Figures (11)

Figure 1: A multi-echelon serial inventory system with $n$ stages. Parameters $h_i$ and $b_i$ will be specified in Section \ref{['app2:problem formulation']}.
Figure 2: Inventory system with one warehouse & $n$ stores. Parameters $h_i,$$c_i$ and $b_i$ are specified in Section \ref{['app3:problem formulation']}.
Figure EC.1: NVP: Comparision with Different Algorithms under Different Distributions
Figure EC.2: NVP: Comparision with Different Algorithms under Different $b$ (Left Panel: $b=5$, Right Panel: $b = 25$)
Figure EC.3: NVP: Variance of Regret and Distance
...and 6 more figures

Theorems & Definitions (36)

Definition 1: Well-behaved Gradient Estimator
Definition 2: Transition Solver
Lemma 1
Remark 1
Lemma 2
Remark 2
Theorem 1
Theorem 2
Remark 3
Remark 4
...and 26 more

A Minibatch-SGD-Based Learning Meta-Policy for Inventory Systems with Myopic Optimal Policy

TL;DR

Abstract

A Minibatch-SGD-Based Learning Meta-Policy for Inventory Systems with Myopic Optimal Policy

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (36)