Deep Clustering Survival Machines with Interpretable Expert Distributions
Bojian Hou, Hongming Li, Zhicheng Jiao, Zhen Zhou, Hao Zheng, Yong Fan
TL;DR
The paper tackles heterogeneity in time-to-event data by introducing Deep Clustering Survival Machines (DCSM), which express the conditional survival distribution as a weighted mixture of K constant Weibull experts. An encoder maps features X to latent representations, producing instance-specific weights α_k = softmax((w^T φ_θ(X))_k) so that P(T|X) = Σ_k α_k P(T|μ_k, σ_k), with risk inferred at a horizon t_max via r_i = 1 - Σ_k α_k CDF(t_max|μ_k, σ_k) and CDF(t) = exp(- (t/σ_k)^{μ_k}). The training objective blends a prior on μ_k, σ_k with ELBO terms for uncensored and censored data, defined as L_all = L_prior - ELBO_U(Θ) - λ ELBO_C(Θ), enabling simultaneous time-to-event prediction and clustering by the dominant expert. Empirical results on four real datasets and 36 synthetic datasets show competitive predictive performance (C-index) and superior clustering quality (LogRank), with the learned expert distributions mirroring Kaplan–Meier curves and enhancing interpretability for personalized prognosis.
Abstract
Conventional survival analysis methods are typically ineffective to characterize heterogeneity in the population while such information can be used to assist predictive modeling. In this study, we propose a hybrid survival analysis method, referred to as deep clustering survival machines, that combines the discriminative and generative mechanisms. Similar to the mixture models, we assume that the timing information of survival data is generatively described by a mixture of certain numbers of parametric distributions, i.e., expert distributions. We learn weights of the expert distributions for individual instances according to their features discriminatively such that each instance's survival information can be characterized by a weighted combination of the learned constant expert distributions. This method also facilitates interpretable subgrouping/clustering of all instances according to their associated expert distributions. Extensive experiments on both real and synthetic datasets have demonstrated that the method is capable of obtaining promising clustering results and competitive time-to-event predicting performance.
