Table of Contents
Fetching ...

Optimal Batched Scheduling of Stochastic Processing Networks Using Atomic Action Decomposition

Jim Dai, Manxi Wu, Zhanhao Zhang

TL;DR

The paper tackles scalable control of stochastic processing networks (SPNs) under batched decisions by introducing atomic action decomposition, which breaks joint server assignments into a sequence of single-server actions without loss of optimality. It defines step-dependent and step-independent atomic policies, proving both can achieve the same long-run average reward as full joint policies, and establishes a theoretical bridge via optimality equations and passing-last reductions. Building on this, the Atomic-PPO algorithm integrates atomic decomposition with proximal policy optimization to learn scalable policies in large SPNs, validated across hospital overflow, switch scheduling, and ride-hailing domains. The approach substantially reduces action-space complexity (to $O(J^2)$ per atomic step) and yields strong empirical performance with practical training efficiency, offering a principled explanation for the success of Atomic-PPO in large-scale systems.

Abstract

Stochastic processing networks (SPNs) have broad applications in healthcare, transportation, and communication networks. The control of SPN is to dynamically assign servers in batches under uncertainty to optimize long-run performance. This problem is challenging as the policy dimension grows exponentially with the number of servers, making standard reinforcement learning and policy optimization methods intractable at scale. We propose an atomic action decomposition framework that addresses this scalability challenge by breaking joint assignments into sequential single-server assignments. This yields policies with constant dimension, independent of the number of servers. We study two classes of atomic policies, the step-dependent and step-independent atomic policies, and prove that both achieve the same optimal long-run average reward as the original joint policies. These results establish that computing the optimal SPN control can be made scalable without loss of optimality using the atomic framework. Our results offer theoretical justification for the strong empirical success of the atomic framework in large-scale applications reported in previous articles.

Optimal Batched Scheduling of Stochastic Processing Networks Using Atomic Action Decomposition

TL;DR

The paper tackles scalable control of stochastic processing networks (SPNs) under batched decisions by introducing atomic action decomposition, which breaks joint server assignments into a sequence of single-server actions without loss of optimality. It defines step-dependent and step-independent atomic policies, proving both can achieve the same long-run average reward as full joint policies, and establishes a theoretical bridge via optimality equations and passing-last reductions. Building on this, the Atomic-PPO algorithm integrates atomic decomposition with proximal policy optimization to learn scalable policies in large SPNs, validated across hospital overflow, switch scheduling, and ride-hailing domains. The approach substantially reduces action-space complexity (to per atomic step) and yields strong empirical performance with practical training efficiency, offering a principled explanation for the success of Atomic-PPO in large-scale systems.

Abstract

Stochastic processing networks (SPNs) have broad applications in healthcare, transportation, and communication networks. The control of SPN is to dynamically assign servers in batches under uncertainty to optimize long-run performance. This problem is challenging as the policy dimension grows exponentially with the number of servers, making standard reinforcement learning and policy optimization methods intractable at scale. We propose an atomic action decomposition framework that addresses this scalability challenge by breaking joint assignments into sequential single-server assignments. This yields policies with constant dimension, independent of the number of servers. We study two classes of atomic policies, the step-dependent and step-independent atomic policies, and prove that both achieve the same optimal long-run average reward as the original joint policies. These results establish that computing the optimal SPN control can be made scalable without loss of optimality using the atomic framework. Our results offer theoretical justification for the strong empirical success of the atomic framework in large-scale applications reported in previous articles.

Paper Structure

This paper contains 15 sections, 9 theorems, 55 equations, 1 figure, 1 algorithm.

Key Result

Theorem 1

$R(\hat{\pi}^*) = R(\pi^*)$.

Figures (1)

  • Figure 1: State transitions induced by atomic actions in each time $t$.

Theorems & Definitions (15)

  • Theorem 1
  • Theorem 2
  • Lemma 1: PutermanMDP
  • Lemma 2
  • proof
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • proof
  • proof
  • ...and 5 more