VC Theory for Inventory Policies

Yaqi Xie; Will Ma; Linwei Xin

VC Theory for Inventory Policies

Yaqi Xie, Will Ma, Linwei Xin

TL;DR

VC Theory for Inventory Policies develops a framework that blends trajectory-based RL with classical inventory-structure regularization, enabling learning of base-stock and $(s,S)$ policies under arbitrary demand sequences. By encoding policy classes as $\Pi_S$, $\Pi_{(s,S)}$, and $\Pi_{(S^t)}$ and applying VC-dimension variants (Rademacher complexity, Pseudo-dimension, and Pseudo$_\gamma$-dimension), the authors derive horizon-free or near horizon-free generalization guarantees and quantify sample complexities. Key findings show that horizon growth does not inflate generalization error for $S$ and $(S^t)$ policies, while $(s,S)$ policies incur a mild log-$T$ dependence; lower bounds confirm the tightness of these results in several regimes. They also compare ERM with PERM under independent versus correlated demands, showing data efficiency benefits when temporal structure is preserved, and stronger robustness to dependence when using $\Pi$-constrained ERM. Together, the results offer practical guidance on policy class selection and data requirements for data-driven inventory management, backed by numerical experiments illustrating the theory in action.

Abstract

There has been growing interest in applying reinforcement learning (RL) to inventory management, either by optimizing over temporal transitions or by learning directly from full historical demand trajectories. This contrasts sharply with classical data-driven approaches, which first estimate demand distributions from past data and then compute well-structured optimal policies via dynamic programming. This paper considers a hybrid approach that combines trajectory-based RL with policy regularization imposing base-stock and $(s, S) $ structures. We provide generalization guarantees for this combined approach for several well-known classes in a $T$-period dynamic inventory model, using tools from the celebrated Vapnik-Chervonenkis (VC) theory, such as the Pseudo-dimension and Fat-shattering dimension. Our results have implications for regret against the best-in-class policies, and allow for an arbitrary distribution over demand sequences, which makes no assumptions such as independence across time. Surprisingly, we prove that the class of policies defined by $T$ non-stationary base-stock levels exhibits a generalization error that does not grow with $T$, whereas the two-parameter $(s, S)$ policy class has a generalization error growing logarithmically with $T$. Overall, our analysis leverages specific inventory structures within the learning theory framework, and improves sample complexity guarantees even compared to existing results assuming independent demands.

VC Theory for Inventory Policies

TL;DR

VC Theory for Inventory Policies develops a framework that blends trajectory-based RL with classical inventory-structure regularization, enabling learning of base-stock and

policies under arbitrary demand sequences. By encoding policy classes as

, and

and applying VC-dimension variants (Rademacher complexity, Pseudo-dimension, and Pseudo

-dimension), the authors derive horizon-free or near horizon-free generalization guarantees and quantify sample complexities. Key findings show that horizon growth does not inflate generalization error for

and

policies, while

policies incur a mild log-

dependence; lower bounds confirm the tightness of these results in several regimes. They also compare ERM with PERM under independent versus correlated demands, showing data efficiency benefits when temporal structure is preserved, and stronger robustness to dependence when using

-constrained ERM. Together, the results offer practical guidance on policy class selection and data requirements for data-driven inventory management, backed by numerical experiments illustrating the theory in action.

Abstract

structures. We provide generalization guarantees for this combined approach for several well-known classes in a

-period dynamic inventory model, using tools from the celebrated Vapnik-Chervonenkis (VC) theory, such as the Pseudo-dimension and Fat-shattering dimension. Our results have implications for regret against the best-in-class policies, and allow for an arbitrary distribution over demand sequences, which makes no assumptions such as independence across time. Surprisingly, we prove that the class of policies defined by

non-stationary base-stock levels exhibits a generalization error that does not grow with

, whereas the two-parameter

policy class has a generalization error growing logarithmically with

. Overall, our analysis leverages specific inventory structures within the learning theory framework, and improves sample complexity guarantees even compared to existing results assuming independent demands.

Paper Structure (34 sections, 12 theorems, 60 equations, 10 figures, 2 tables)

This paper contains 34 sections, 12 theorems, 60 equations, 10 figures, 2 tables.

Introduction
Theoretical Results---(Nearly) Horizon-free Estimation Error
Practical Insights and Further Evidence from Simulations
Literature Review
Comparison with Existing Theoretical Results
Further Related Work
Model and Preliminaries
Estimation Error
Approximation Error
Rademacher Complexity, VC-Dimension, Pseudo-Dimension, and Pseudo$_\gamma$-Dimension
Theoretical Results
The Class of Stationary Base-Stock Policies
The Class of ($s, S$) Policies: Upper Bound
The Class of ($s, S$) Policies: Lower Bound
The Class of Non-Stationary Base-Stock Policies
...and 19 more sections

Key Result

Proposition 1

For any distribution $\mathcal{D}\in \Delta([0, U]^{T+L})$, sample size $N\in\mathbb{Z}_{>0}$, and function class $\mathcal{F}(\Pi) = \left\{f(\pi, \cdot): \pi\in \Pi\right\}$, the following inequality holds:

Figures (10)

Figure 1: The expected EE's relative to the optimal loss, and OOS loss ratios, of $\Pi_S$, $\Pi_{(s, S)}$ and $\Pi_{(S^t)}$ policies when $K=0$. Note that for $T=1$, the OOS loss ratio of $\Pi_S$ and $\Pi_{(S^t)}$ (equivalent) is 1.029.
Figure 2: The OOS loss ratios of $\Pi_{(s,S)}$ and $\Pi_{S}$ when $K>0$. Subfigures (b), (c), and (d) present the results for different values of $T$, $K$, and $\sigma_0$, respectively, compared with (a).
Figure 3: The OOS loss ratios of $\Pi_{(S^t)}$ and $\Pi_{S}$ when $K=0$. Subfigures (b), (c), and (d) present the results for different values of $T$, $\mathsf{Nonst}$, and $\sigma_0$, respectively, compared with (a).
Figure 4: The OOS ratios of ERM approaches compared to PERM approach under independent demands.
Figure 5: The OOS ratios of ERM approaches compared to PERM approach under correlated demands.
...and 5 more figures

Theorems & Definitions (22)

Definition 1: Rademacher complexity
Proposition 1
Definition 2: Pseudo-dimension
Proposition 2
Definition 3: Pseudo$_\gamma$-dimension
Proposition 3
Theorem 1: Proved in \ref{['pf:thm-base']}
Corollary 1
Remark 1
Theorem 2: Proved in \ref{['pf:thm-ss-ub']}
...and 12 more

VC Theory for Inventory Policies

TL;DR

Abstract

VC Theory for Inventory Policies

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (22)