Table of Contents
Fetching ...

Deep Reinforcement Learning for Efficient and Fair Allocation of Health Care Resources

Yikuan Li, Chengsheng Mao, Kaixuan Huang, Hanyin Wang, Zheng Yu, Mengdi Wang, Yuan Luo

TL;DR

Scarcity of critical care resources motivates a data-driven, fairness-aware allocation policy. The paper introduces a Transformer-based Q-network (TxDDQN) that jointly accounts for individual disease trajectories and inter-patient interactions within a day-to-day MDP, optimized under capacity and fairness constraints. Using real-world ICU data, the method improves survival under ventilator shortages while achieving more equitable distributions across ethnoracial groups, as measured by a fairness penalty based on KL-divergence ($D_{KL}$) and the demographic parity ratio (DPR). The results suggest that structured RL with fairness integration can inform crisis standards of care and guide allocation policies during health emergencies.

Abstract

Scarcity of health care resources could result in the unavoidable consequence of rationing. For example, ventilators are often limited in supply, especially during public health emergencies or in resource-constrained health care settings, such as amid the pandemic of COVID-19. Currently, there is no universally accepted standard for health care resource allocation protocols, resulting in different governments prioritizing patients based on various criteria and heuristic-based protocols. In this study, we investigate the use of reinforcement learning for critical care resource allocation policy optimization to fairly and effectively ration resources. We propose a transformer-based deep Q-network to integrate the disease progression of individual patients and the interaction effects among patients during the critical care resource allocation. We aim to improve both fairness of allocation and overall patient outcomes. Our experiments demonstrate that our method significantly reduces excess deaths and achieves a more equitable distribution under different levels of ventilator shortage, when compared to existing severity-based and comorbidity-based methods in use by different governments. Our source code is included in the supplement and will be released on Github upon publication.

Deep Reinforcement Learning for Efficient and Fair Allocation of Health Care Resources

TL;DR

Scarcity of critical care resources motivates a data-driven, fairness-aware allocation policy. The paper introduces a Transformer-based Q-network (TxDDQN) that jointly accounts for individual disease trajectories and inter-patient interactions within a day-to-day MDP, optimized under capacity and fairness constraints. Using real-world ICU data, the method improves survival under ventilator shortages while achieving more equitable distributions across ethnoracial groups, as measured by a fairness penalty based on KL-divergence () and the demographic parity ratio (DPR). The results suggest that structured RL with fairness integration can inform crisis standards of care and guide allocation policies during health emergencies.

Abstract

Scarcity of health care resources could result in the unavoidable consequence of rationing. For example, ventilators are often limited in supply, especially during public health emergencies or in resource-constrained health care settings, such as amid the pandemic of COVID-19. Currently, there is no universally accepted standard for health care resource allocation protocols, resulting in different governments prioritizing patients based on various criteria and heuristic-based protocols. In this study, we investigate the use of reinforcement learning for critical care resource allocation policy optimization to fairly and effectively ration resources. We propose a transformer-based deep Q-network to integrate the disease progression of individual patients and the interaction effects among patients during the critical care resource allocation. We aim to improve both fairness of allocation and overall patient outcomes. Our experiments demonstrate that our method significantly reduces excess deaths and achieves a more equitable distribution under different levels of ventilator shortage, when compared to existing severity-based and comorbidity-based methods in use by different governments. Our source code is included in the supplement and will be released on Github upon publication.
Paper Structure (23 sections, 2 equations, 7 figures, 8 tables, 1 algorithm)

This paper contains 23 sections, 2 equations, 7 figures, 8 tables, 1 algorithm.

Figures (7)

  • Figure 1: An illustration for our study formulation. Each color represents a separate patient.
  • Figure 2: Impact of triage protocols on survival rates and allocation rates under varying levels of ventilator shortages. The maximum daily demand for ventilators in the testing set is considered as full capacity (100%). We scale the number of survivors to a range of [0, 100%] to represent the survival rate. The allocation rate is calculated by dividing the total number of ventilators allocated by the total number of the ventilators requested. The bar plot associated with each panel indicates the area under the survival-capacity curve and allocation-capactiy curve, respectively, where a larger value indicates that the protocol can save more lives across different levels of shortages. Notably, the MP and SOFA curves exhibit overlap, indicating similar allocation patterns. Similarly, the lottery and youngest curves show close proximity, as do our three TxDDQN configurations.
  • Figure 3: Allocation rates across protocols and ethnoracialgGroups. Each panel illustrates how allocation rates vary by ethnoracial group under different protocols. The numbers in the legend indicate the area under the allocation-capacity curve (AUACC).
  • Figure A1: Daily Ventilator Demands
  • Figure A2: The survival-fairness Pareto frontier was examined under a 50% shortage of ventilators. Each data point corresponds to the outcomes from distinct values of $\lambda$, which balance the trade-off between allocation effectiveness and fairness. As $\lambda$ increases (red dots move to the right), fairness can be enhanced until a turning point is reached, at which the survival rate begins to decline. If $\lambda$ is increased to an exceedingly large value (e.g., 1e6), the model will tend to favor a protocol that prioritizes fairness without due consideration for life-saving. We reported the results from the turning point ($\lambda$ = 1e3), where the model is enhanced in fairness without compromising the survival rate.
  • ...and 2 more figures