Table of Contents
Fetching ...

Closing the Loop: Coordinating Inventory and Recommendation via Deep Reinforcement Learning on Multiple Timescales

Jinyang Jiang, Jinhui Han, Yijie Peng, Ying Zhang

TL;DR

This work tackles cross-functional coordination of inventory replenishment and personalized product recommendations under interdependent demand and lead-time dynamics. It advances a model-based theoretical analysis of cross-product and intertemporal synergies and translates these insights into a model-free, multi-timescale, multi-agent reinforcement learning framework with two agents for replenishment and recommendations. The MTMA RL algorithm provides convergence guarantees and scalable training via centralized critic–decentralized execution and PPO-style updates with memory, validated by simulations showing profit gains and behavior aligned with managerial intuition. The results demonstrate that coordinating decision-making across departments yields substantial improvements in profitability and operational stability, highlighting the practical impact of modular, interpretable RL for complex organ izational settings.

Abstract

Effective cross-functional coordination is essential for enhancing firm-wide profitability, particularly in the face of growing organizational complexity and scale. Recent advances in artificial intelligence, especially in reinforcement learning (RL), offer promising avenues to address this fundamental challenge. This paper proposes a unified multi-agent RL framework tailored for joint optimization across distinct functional modules, exemplified via coordinating inventory replenishment and personalized product recommendation. We first develop an integrated theoretical model to capture the intricate interplay between these functions and derive analytical benchmarks that characterize optimal coordination. The analysis reveals synchronized adjustment patterns across products and over time, highlighting the importance of coordinated decision-making. Leveraging these insights, we design a novel multi-timescale multi-agent RL architecture that decomposes policy components according to departmental functions and assigns distinct learning speeds based on task complexity and responsiveness. Our model-free multi-agent design improves scalability and deployment flexibility, while multi-timescale updates enhance convergence stability and adaptability across heterogeneous decisions. We further establish the asymptotic convergence of the proposed algorithm. Extensive simulation experiments demonstrate that the proposed approach significantly improves profitability relative to siloed decision-making frameworks, while the behaviors of the trained RL agents align closely with the managerial insights from our theoretical model. Taken together, this work provides a scalable, interpretable RL-based solution to enable effective cross-functional coordination in complex business settings.

Closing the Loop: Coordinating Inventory and Recommendation via Deep Reinforcement Learning on Multiple Timescales

TL;DR

This work tackles cross-functional coordination of inventory replenishment and personalized product recommendations under interdependent demand and lead-time dynamics. It advances a model-based theoretical analysis of cross-product and intertemporal synergies and translates these insights into a model-free, multi-timescale, multi-agent reinforcement learning framework with two agents for replenishment and recommendations. The MTMA RL algorithm provides convergence guarantees and scalable training via centralized critic–decentralized execution and PPO-style updates with memory, validated by simulations showing profit gains and behavior aligned with managerial intuition. The results demonstrate that coordinating decision-making across departments yields substantial improvements in profitability and operational stability, highlighting the practical impact of modular, interpretable RL for complex organ izational settings.

Abstract

Effective cross-functional coordination is essential for enhancing firm-wide profitability, particularly in the face of growing organizational complexity and scale. Recent advances in artificial intelligence, especially in reinforcement learning (RL), offer promising avenues to address this fundamental challenge. This paper proposes a unified multi-agent RL framework tailored for joint optimization across distinct functional modules, exemplified via coordinating inventory replenishment and personalized product recommendation. We first develop an integrated theoretical model to capture the intricate interplay between these functions and derive analytical benchmarks that characterize optimal coordination. The analysis reveals synchronized adjustment patterns across products and over time, highlighting the importance of coordinated decision-making. Leveraging these insights, we design a novel multi-timescale multi-agent RL architecture that decomposes policy components according to departmental functions and assigns distinct learning speeds based on task complexity and responsiveness. Our model-free multi-agent design improves scalability and deployment flexibility, while multi-timescale updates enhance convergence stability and adaptability across heterogeneous decisions. We further establish the asymptotic convergence of the proposed algorithm. Extensive simulation experiments demonstrate that the proposed approach significantly improves profitability relative to siloed decision-making frameworks, while the behaviors of the trained RL agents align closely with the managerial insights from our theoretical model. Taken together, this work provides a scalable, interpretable RL-based solution to enable effective cross-functional coordination in complex business settings.

Paper Structure

This paper contains 24 sections, 8 theorems, 65 equations, 14 figures, 5 tables, 1 algorithm.

Key Result

Proposition 1

For the considered simplified system, the optimal replenishment quantities $q^{i,*}$ are uniquely determined given each product's recommendation decision $\alpha^i$. As the recommendation intensity $\alpha^i$ for product $i$ increases, its optimal replenishment quantity also increases, while the rep

Figures (14)

  • Figure 1: Illustration of the dynamic interaction between the inventory and recommendation systems.
  • Figure 2: Recommendation decision regions depending on RME and RMP.
  • Figure 3: Joint policy optimization via stochastic approximation. $(x[n], y[n])$ denotes the joint policy at step $n$; $(x^*, y^*)$ indicates the globally optimal policy; and $(x^*(y), y)$ represents the optimal fast-timescale response $x$ given a fixed slow-timescale policy $y$. Panels (a), (b), and (c) use the same number of iterations.
  • Figure 4: Comparison of decision-making pipelines under different policy parameterizations.
  • Figure 5: Learning curves under different algorithmic configurations. MTMA denotes the proposed multi-timescale multi-agent RL approach; STMA-F and STMA-S are single-timescale multi-agent baselines using shared fast or slow step sizes, respectively, matching those used in MTMA; STSA-F and STSA-S refer to single-agent baselines trained with the same fast or slow step sizes as in MTMA. Curves are averaged over 20 independent runs, with solid lines indicating the mean and shaded areas showing $95\%$ confidence intervals.
  • ...and 9 more figures

Theorems & Definitions (8)

  • Proposition 1
  • Proposition 2
  • Proposition 3: Demand Smoothing
  • Proposition 4: Adaptive Ordering
  • Theorem 1
  • Theorem 2: Asymptotics for the Fast-Timescale Agent
  • Theorem 3: Asymptotics for the Slow-Timescale Agent
  • Lemma EC.1: Fast-timescale convergence to conditional optimum