A Bayesian Finite Mixture Model Approach for Mixed-type Data Clustering and Variable Selection with Censored Biomarkers

Yueting Wang; Shu Wang; Jonathan G. Yabes; Chung-Chou H. Chang

A Bayesian Finite Mixture Model Approach for Mixed-type Data Clustering and Variable Selection with Censored Biomarkers

Yueting Wang, Shu Wang, Jonathan G. Yabes, Chung-Chou H. Chang

Abstract

Clustering mixed-type data remains a major challenge in biomedical research to uncover clinically meaningful subgroups within heterogeneous patient populations. Most existing clustering methods impose restrictive assumptions like local independence, fail to accommodate censored biomarkers, or unable to quantify variable importance. We propose a Bayesian finite mixture model (BFMM) clustering framework that addresses these limitations. BFMM flexibly models both continuous and categorical variables, incorporates three covariance structures to capture cluster-specific dependencies among continuous features, and handles censored observations through likelihood-based imputation. To facilitate feature prioritization, BFMM uses spike-and-slab priors to estimate variable importance on a continuous 0-1 scale. Simulation studies demonstrate that BFMM outperforms existing methods in clustering accuracy, particularly given strong within-cluster correlation or censored variables, and reliably distinguishes informative features from noise under varying conditions. We applied BFMM to two real-world datasets: (1) the SENECA cohort integrating electronic health records from patients with Sepsis; and (2) the EDEN randomized trial of patients with acute lung injury. In both settings, BFMM identified clinically interpretable phenotypes and revealed variable-specific contributions to subgroup differentiation. In the EDEN trial, it also uncovered evidence of treatment heterogeneity. These findings validate BFMM as an effective, interpretable, and practically useful clustering tool for complex biomedical datasets.

A Bayesian Finite Mixture Model Approach for Mixed-type Data Clustering and Variable Selection with Censored Biomarkers

Abstract

A Bayesian Finite Mixture Model Approach for Mixed-type Data Clustering and Variable Selection with Censored Biomarkers

Abstract

Paper Structure

Table of Contents

Figures (8)