Table of Contents
Fetching ...

Inpatient Overflow Management with Proximal Policy Optimization

Jingjing Sun, Jim Dai, Pengyi Shi

TL;DR

This work addresses scalable overflow management in inpatient hospital systems by modeling overflow decisions as a time-periodic, long-run average cost MDP with large state/action spaces. The authors introduce atomic actions to decompose multi-patient routing into tractable sequential decisions and couple this with a randomized PPO framework, enhanced by a partially-shared policy network and a queueing-informed linear value function. The approach achieves near-optimal performance on five-, ten-, and twenty-pool systems, matching or outperforming approximate dynamic programming (ADP) while dramatically reducing computation time and data requirements. The combination of domain-specific policy design, batching, and pool-wise value decomposition yields strong sample efficiency and scalability, with practical implications for real-world hospital overflow management. The work demonstrates that tailoring general RL methods to queueing-structured, time-periodic problems can yield substantial gains in both performance and efficiency, enabling deployment in large-scale healthcare operations.

Abstract

Overflowing patients to non-primary wards can effectively alleviate congestion in hospitals, while undesired overflow also leads to issues like mismatched service quality. Therefore, we need to trade off between congestion and undesired overflow. This overflow management problem is modeled as a discrete-time Markov Decision Process with large state and action space. To overcome the curse-of-dimensionality, we decompose the action at each time into a sequence of atomic actions and use an actor-critic algorithm, Proximal Policy Optimization (PPO), to guide the atomic actions. Moreover, we tailor the design of neural network which represents policy to account for the daily periodic pattern of the system flows. Under hospital settings of different scales, the PPO policies consistently outperform commonly used state-of-art policies.

Inpatient Overflow Management with Proximal Policy Optimization

TL;DR

This work addresses scalable overflow management in inpatient hospital systems by modeling overflow decisions as a time-periodic, long-run average cost MDP with large state/action spaces. The authors introduce atomic actions to decompose multi-patient routing into tractable sequential decisions and couple this with a randomized PPO framework, enhanced by a partially-shared policy network and a queueing-informed linear value function. The approach achieves near-optimal performance on five-, ten-, and twenty-pool systems, matching or outperforming approximate dynamic programming (ADP) while dramatically reducing computation time and data requirements. The combination of domain-specific policy design, batching, and pool-wise value decomposition yields strong sample efficiency and scalability, with practical implications for real-world hospital overflow management. The work demonstrates that tailoring general RL methods to queueing-structured, time-periodic problems can yield substantial gains in both performance and efficiency, enabling deployment in large-scale healthcare operations.

Abstract

Overflowing patients to non-primary wards can effectively alleviate congestion in hospitals, while undesired overflow also leads to issues like mismatched service quality. Therefore, we need to trade off between congestion and undesired overflow. This overflow management problem is modeled as a discrete-time Markov Decision Process with large state and action space. To overcome the curse-of-dimensionality, we decompose the action at each time into a sequence of atomic actions and use an actor-critic algorithm, Proximal Policy Optimization (PPO), to guide the atomic actions. Moreover, we tailor the design of neural network which represents policy to account for the daily periodic pattern of the system flows. Under hospital settings of different scales, the PPO policies consistently outperform commonly used state-of-art policies.

Paper Structure

This paper contains 50 sections, 3 theorems, 99 equations, 8 figures, 13 tables, 1 algorithm.

Key Result

Lemma 1

Under the batching setup, for a given system-level state $s$ and any pre-determined rule for choosing customer order $\boldsymbol{\sigma}(s)$, the probability of taking action $f$ follows a multinomial distribution that is independent of the order $\boldsymbol{\sigma}(s)$, i.e.,

Figures (8)

  • Figure 1: Illustration of the atomic decision process.
  • Figure 2: Three policy network structures. The fully-connected network takes epoch index as part of the input and uses a single network to learn actions across all epochs. The fully-separate network trains $m$ separate networks for each of the $m$ epochs. The partially-shared network excludes the epoch index from input but uses different blocks of output neurons for actions of different epochs. The parameters of the first several layers are shared across epochs and while those of the later layers are specific to each epoch.
  • Figure 3: Illustration of the ten-class ten-pool system based on practice in our partner hospital. The solid black arrow depicts the primary assignment route, while red and blue dashed arrows show preferred and secondary overflow assignments within a department. Additionally, orange and green dotted arrows represent cross-department overflow assignments (only from VIP to regular).
  • Figure 4: Performance comparison in the ten-pool and twenty-pool systems.
  • Figure 5: Trade-off between performance and training days and epochs. Performance is measured as the % improvement in average cost between the PPO policy and the best-performing benchmark (the empirical policy).
  • ...and 3 more figures

Theorems & Definitions (4)

  • Lemma 1
  • proof
  • Proposition 1
  • Lemma 2