Table of Contents
Fetching ...

Integrating Probabilistic Trees and Causal Networks for Clinical and Epidemiological Data

Sheresh Zahoor, Pietro Liò, Gaël Dias, Mohammed Hasanuzzaman

TL;DR

The paper tackles the need for healthcare models that deliver causal insights alongside predictions. It introduces Probabilistic Causal Fusion (PCF), a framework that fuses Causal Bayesian Networks with ensembles of Probability Trees, using a CBN-derived order to structure PTrees and enable interventions and counterfactual reasoning. PCF demonstrates predictive performance on par with established models across MIMIC-IV, Framingham, and BRFSS-2015 diabetes data, while also offering sensitivity analysis and SHAP-based interpretability to illuminate causal factors. The approach supports both individual and population-level decision-making through counterfactual analyses and a centralized causal knowledge repository, potentially enhancing evidence-based clinical practice and cross-institution collaboration.

Abstract

Healthcare decision-making requires not only accurate predictions but also insights into how factors influence patient outcomes. While traditional Machine Learning (ML) models excel at predicting outcomes, such as identifying high risk patients, they are limited in addressing what-if questions about interventions. This study introduces the Probabilistic Causal Fusion (PCF) framework, which integrates Causal Bayesian Networks (CBNs) and Probability Trees (PTrees) to extend beyond predictions. PCF leverages causal relationships from CBNs to structure PTrees, enabling both the quantification of factor impacts and simulation of hypothetical interventions. PCF was validated on three real-world healthcare datasets i.e. MIMIC-IV, Framingham Heart Study, and Diabetes, chosen for their clinically diverse variables. It demonstrated predictive performance comparable to traditional ML models while providing additional causal reasoning capabilities. To enhance interpretability, PCF incorporates sensitivity analysis and SHapley Additive exPlanations (SHAP). Sensitivity analysis quantifies the influence of causal parameters on outcomes such as Length of Stay (LOS), Coronary Heart Disease (CHD), and Diabetes, while SHAP highlights the importance of individual features in predictive modeling. By combining causal reasoning with predictive modeling, PCF bridges the gap between clinical intuition and data-driven insights. Its ability to uncover relationships between modifiable factors and simulate hypothetical scenarios provides clinicians with a clearer understanding of causal pathways. This approach supports more informed, evidence-based decision-making, offering a robust framework for addressing complex questions in diverse healthcare settings.

Integrating Probabilistic Trees and Causal Networks for Clinical and Epidemiological Data

TL;DR

The paper tackles the need for healthcare models that deliver causal insights alongside predictions. It introduces Probabilistic Causal Fusion (PCF), a framework that fuses Causal Bayesian Networks with ensembles of Probability Trees, using a CBN-derived order to structure PTrees and enable interventions and counterfactual reasoning. PCF demonstrates predictive performance on par with established models across MIMIC-IV, Framingham, and BRFSS-2015 diabetes data, while also offering sensitivity analysis and SHAP-based interpretability to illuminate causal factors. The approach supports both individual and population-level decision-making through counterfactual analyses and a centralized causal knowledge repository, potentially enhancing evidence-based clinical practice and cross-institution collaboration.

Abstract

Healthcare decision-making requires not only accurate predictions but also insights into how factors influence patient outcomes. While traditional Machine Learning (ML) models excel at predicting outcomes, such as identifying high risk patients, they are limited in addressing what-if questions about interventions. This study introduces the Probabilistic Causal Fusion (PCF) framework, which integrates Causal Bayesian Networks (CBNs) and Probability Trees (PTrees) to extend beyond predictions. PCF leverages causal relationships from CBNs to structure PTrees, enabling both the quantification of factor impacts and simulation of hypothetical interventions. PCF was validated on three real-world healthcare datasets i.e. MIMIC-IV, Framingham Heart Study, and Diabetes, chosen for their clinically diverse variables. It demonstrated predictive performance comparable to traditional ML models while providing additional causal reasoning capabilities. To enhance interpretability, PCF incorporates sensitivity analysis and SHapley Additive exPlanations (SHAP). Sensitivity analysis quantifies the influence of causal parameters on outcomes such as Length of Stay (LOS), Coronary Heart Disease (CHD), and Diabetes, while SHAP highlights the importance of individual features in predictive modeling. By combining causal reasoning with predictive modeling, PCF bridges the gap between clinical intuition and data-driven insights. Its ability to uncover relationships between modifiable factors and simulate hypothetical scenarios provides clinicians with a clearer understanding of causal pathways. This approach supports more informed, evidence-based decision-making, offering a robust framework for addressing complex questions in diverse healthcare settings.

Paper Structure

This paper contains 40 sections, 6 equations, 11 figures, 6 tables, 2 algorithms.

Figures (11)

  • Figure 1: Different steps involved in the PCF framework. The first module addresses data pre-processing to shape the input required for the CBN. The next module involves generating individual CBNs and creating a model-averaging graph. Subsequently, the ensemble of PTrees is developed based on the variable order from the model-averaging graph. The final module involves evaluating the overall performance of PCF.
  • Figure 2: Sensitivity Analysis for LOS, Diabetes, and Framingham datasets.
  • Figure 3: SHAP plot showing feature impacts on predictions for LOS, CHD, and Diabetes.
  • Figure 4: Probability change of los given interventions on heart_rate, Urea_Nitrogen (UN), RDW, Creatinine, Glucose, temperature (temp), saturation (sat), and respiration rate (resp)
  • Figure 5: Probability change of TenYearCHD given interventions on sysBP, diaBP, totChol, BMI, education, glucose, heartRate and cigsPerDay.
  • ...and 6 more figures