Clustering Survival Data using a Mixture of Non-parametric Experts
Gabriel Buginga, Edmundo de Souza e Silva
TL;DR
SurvMixClust addresses the need to jointly cluster individuals and predict time-to-event outcomes in right-censored data. It models data as a finite mixture of $K$ nonparametric survival distributions with covariate-driven mixing weights $\tau_k(\mathbf{x})$ learned via multinomial logistic regression, and cluster-specific survival functions estimated with Kaplan-Meier. An EM-based training procedure computes responsibilities and updates both the clustering and survival components, using a stochastic variant to improve scalability. Across five public datasets, the approach yields balanced, heterogeneous clusters with distinct survival curves and demonstrates competitive predictive performance relative to non-clustering survival models and superiority over clustering baselines in several settings, highlighting its potential for precision medicine and heterogeneous treatment effect analysis. The accompanying code enables practical adoption and further methodological development.
Abstract
Survival analysis aims to predict the timing of future events across various fields, from medical outcomes to customer churn. However, the integration of clustering into survival analysis, particularly for precision medicine, remains underexplored. This study introduces SurvMixClust, a novel algorithm for survival analysis that integrates clustering with survival function prediction within a unified framework. SurvMixClust learns latent representations for clustering while also predicting individual survival functions using a mixture of non-parametric experts. Our evaluations on five public datasets show that SurvMixClust creates balanced clusters with distinct survival curves, outperforms clustering baselines, and competes with non-clustering survival models in predictive accuracy, as measured by the time-dependent c-index and log-rank metrics.
