Table of Contents
Fetching ...

Identifying Differential Patient Care Through Inverse Intent Inference

Hyewon Jeong, Siddharth Nayak, Taylor Killian, Sanjat Kanjilal

TL;DR

A number of reinforcement learning techniques are applied, including behavioral cloning, imitation learning, and inverse reinforcement learning, to learn the optimal policy in the management of septic patient subgroups using expert demonstrations to estimate the counterfactual treatment policy and identify deviations across sub-populations of interest.

Abstract

Sepsis is a life-threatening condition defined by end-organ dysfunction due to a dysregulated host response to infection. Although the Surviving Sepsis Campaign has launched and has been releasing sepsis treatment guidelines to unify and normalize the care for sepsis patients, it has been reported in numerous studies that disparities in care exist across the trajectory of patient stay in the emergency department and intensive care unit. Here, we apply a number of reinforcement learning techniques including behavioral cloning, imitation learning, and inverse reinforcement learning, to learn the optimal policy in the management of septic patient subgroups using expert demonstrations. Then we estimate the counterfactual optimal policies by applying the model to another subset of unseen medical populations and identify the difference in cure by comparing it to the real policy. Our data comes from the sepsis cohort of MIMIC-IV and the clinical data warehouses of the Mass General Brigham healthcare system. The ultimate objective of this work is to use the optimal learned policy function to estimate the counterfactual treatment policy and identify deviations across sub-populations of interest. We hope this approach would help us identify any disparities in care and also changes in cure in response to the publication of national sepsis treatment guidelines.

Identifying Differential Patient Care Through Inverse Intent Inference

TL;DR

A number of reinforcement learning techniques are applied, including behavioral cloning, imitation learning, and inverse reinforcement learning, to learn the optimal policy in the management of septic patient subgroups using expert demonstrations to estimate the counterfactual treatment policy and identify deviations across sub-populations of interest.

Abstract

Sepsis is a life-threatening condition defined by end-organ dysfunction due to a dysregulated host response to infection. Although the Surviving Sepsis Campaign has launched and has been releasing sepsis treatment guidelines to unify and normalize the care for sepsis patients, it has been reported in numerous studies that disparities in care exist across the trajectory of patient stay in the emergency department and intensive care unit. Here, we apply a number of reinforcement learning techniques including behavioral cloning, imitation learning, and inverse reinforcement learning, to learn the optimal policy in the management of septic patient subgroups using expert demonstrations. Then we estimate the counterfactual optimal policies by applying the model to another subset of unseen medical populations and identify the difference in cure by comparing it to the real policy. Our data comes from the sepsis cohort of MIMIC-IV and the clinical data warehouses of the Mass General Brigham healthcare system. The ultimate objective of this work is to use the optimal learned policy function to estimate the counterfactual treatment policy and identify deviations across sub-populations of interest. We hope this approach would help us identify any disparities in care and also changes in cure in response to the publication of national sepsis treatment guidelines.

Paper Structure

This paper contains 14 sections, 4 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Analytic approach to identify differences in sepsis treatment policies across different patient subgroups with Imitation Learning and Behavioral Cloning. We let model agent learn the expert trajectory in one patient subgroup, then apply this learned agent to another patient subgroup to get the discrepancy between counterfactual policy $\pi'_B = \pi'(\textbf{z}_i|A=B)$ where $\textbf{z}_i$ is the set of state transition, conditioned on the patient attribute $A=B$, and original policy $\pi_B$.
  • Figure 2: Action Trajectory of Patients across timestep