Table of Contents
Fetching ...

LUME-DBN: Full Bayesian Learning of DBNs from Incomplete data in Intensive Care

Federico Pirola, Fabio Stella, Marco Grzegorczyk

TL;DR

The paper tackles learning temporal dependencies in ICU time-series with frequent missing data by proposing LUME-DBN, a full Bayesian DBN learning framework with a Gibbs sampling imputation step. It models each variable with a Bayesian linear regression over a one-slice lag, derives tractable full conditional distributions for missing values, and jointly learns structure and parameters while imputing data. Across synthetic experiments and a PhysioNet ICU case study, LUME-DBN achieves superior network reconstruction (higher AUC-PR) and provides explicit uncertainty quantification for both missing data and network structure, outperforming model-agnostic baselines like MICE and Temporal MICE. The approach enhances clinical decision support by yielding safer imputations and more reliable temporal inferences, with clear avenues for extensions to MNAR, non-homogeneous, and expert-informed DBNs.

Abstract

Dynamic Bayesian networks (DBNs) are increasingly used in healthcare due to their ability to model complex temporal relationships in patient data while maintaining interpretability, an essential feature for clinical decision-making. However, existing approaches to handling missing data in longitudinal clinical datasets are largely derived from static Bayesian networks literature, failing to properly account for the temporal nature of the data. This gap limits the ability to quantify uncertainty over time, which is particularly critical in settings such as intensive care, where understanding the temporal dynamics is fundamental for model trustworthiness and applicability across diverse patient groups. Despite the potential of DBNs, a full Bayesian framework that integrates missing data handling remains underdeveloped. In this work, we propose a novel Gibbs sampling-based method for learning DBNs from incomplete data. Our method treats each missing value as an unknown parameter following a Gaussian distribution. At each iteration, the unobserved values are sampled from their full conditional distributions, allowing for principled imputation and uncertainty estimation. We evaluate our method on both simulated datasets and real-world intensive care data from critically ill patients. Compared to standard model-agnostic techniques such as MICE, our Bayesian approach demonstrates superior reconstruction accuracy and convergence properties. These results highlight the clinical relevance of incorporating full Bayesian inference in temporal models, providing more reliable imputations and offering deeper insight into model behavior. Our approach supports safer and more informed clinical decision-making, particularly in settings where missing data are frequent and potentially impactful.

LUME-DBN: Full Bayesian Learning of DBNs from Incomplete data in Intensive Care

TL;DR

The paper tackles learning temporal dependencies in ICU time-series with frequent missing data by proposing LUME-DBN, a full Bayesian DBN learning framework with a Gibbs sampling imputation step. It models each variable with a Bayesian linear regression over a one-slice lag, derives tractable full conditional distributions for missing values, and jointly learns structure and parameters while imputing data. Across synthetic experiments and a PhysioNet ICU case study, LUME-DBN achieves superior network reconstruction (higher AUC-PR) and provides explicit uncertainty quantification for both missing data and network structure, outperforming model-agnostic baselines like MICE and Temporal MICE. The approach enhances clinical decision support by yielding safer imputations and more reliable temporal inferences, with clear avenues for extensions to MNAR, non-homogeneous, and expert-informed DBNs.

Abstract

Dynamic Bayesian networks (DBNs) are increasingly used in healthcare due to their ability to model complex temporal relationships in patient data while maintaining interpretability, an essential feature for clinical decision-making. However, existing approaches to handling missing data in longitudinal clinical datasets are largely derived from static Bayesian networks literature, failing to properly account for the temporal nature of the data. This gap limits the ability to quantify uncertainty over time, which is particularly critical in settings such as intensive care, where understanding the temporal dynamics is fundamental for model trustworthiness and applicability across diverse patient groups. Despite the potential of DBNs, a full Bayesian framework that integrates missing data handling remains underdeveloped. In this work, we propose a novel Gibbs sampling-based method for learning DBNs from incomplete data. Our method treats each missing value as an unknown parameter following a Gaussian distribution. At each iteration, the unobserved values are sampled from their full conditional distributions, allowing for principled imputation and uncertainty estimation. We evaluate our method on both simulated datasets and real-world intensive care data from critically ill patients. Compared to standard model-agnostic techniques such as MICE, our Bayesian approach demonstrates superior reconstruction accuracy and convergence properties. These results highlight the clinical relevance of incorporating full Bayesian inference in temporal models, providing more reliable imputations and offering deeper insight into model behavior. Our approach supports safer and more informed clinical decision-making, particularly in settings where missing data are frequent and potentially impactful.

Paper Structure

This paper contains 21 sections, 12 equations, 8 figures, 3 tables, 4 algorithms.

Figures (8)

  • Figure 1: Area Under the Precision-Recall Curve for different experimental settings (sample sizes, missingness rates and imputation methods). The p-values of the paired t-test LUME-DBN AUC vs Baseline Method AUC are computed for each experimental condition, highlighting p-values < 0.05 with colored '$\star$' based on the baseline method. Confidence bars represent the 95% confidence intervals for each experimental setting.
  • Figure 2: Reconstructed DBNs for each ICU type, averaged over five independent simulations after local data standardization. A threshold of 0.8 is applied to the averaged inclusion probabilities. Arcs are meant to represent temporal relationship with a single temporal lag, namely between nodes at time $t-1$ and nodes at time $t$.
  • Figure A.3: Two examples of DBNs. a) A DBN with 3 temporal nodes and 2 arcs. b) A more complex DBN with 5 temporal nodes, with a node $X_3^t$ with 2 parents $\{X_1^{t-1}, X_2^{t-1}\}$, 2 children $\{X_4^{t+1}, X_5^{t+1}\}$ and one node with a common children $\{X_4^{t}\}$ .
  • Figure D.4: Convergence Diagnostic for Network reconstruction averaged over 5 simulations for simulated datasets with different missingness rates.
  • Figure D.5: Convergence Diagnostic for Missing Value imputation averaged over 5 simulations for simulated datasets with different missingness rates.
  • ...and 3 more figures