Table of Contents
Fetching ...

Generating Survival Interpretable Trajectories and Data

Andrei V. Konstantinov, Stanislav R. Kirpichenko, Lev V. Utkin

TL;DR

The paper tackles time-to-event prediction under censoring by proposing a Wasserstein variational autoencoder that jointly learns survival trajectories and data generation. It uses the Beran estimator within a robust, end-to-end framework to produce $E[T|\mathbf{x}]$ and $S(t|\mathbf{x})$, while also yielding a prototype trajectory $\xi_{\mathbf{x}}(t)$ that encodes how features should change to alter the event time. Key contributions include (i) an encoder–decoder VAE with a Bayes-weighted trajectory in latent space, (ii) a data-generation pipeline for new survival instances with censoring indicators determined by a classifier, and (iii) extensive numerical validation on synthetic and real datasets showing competitive predictive performance and interpretable trajectories. The approach advances survival data generation and explainable trajectories, with practical impact for data augmentation, counterfactual explanations, and risk prediction, and it opens avenues for integrating alternative survival models beyond Beran.

Abstract

A new model for generating survival trajectories and data based on applying an autoencoder of a specific structure is proposed. It solves three tasks. First, it provides predictions in the form of the expected event time and the survival function for a new generated feature vector on the basis of the Beran estimator. Second, the model generates additional data based on a given training set that would supplement the original dataset. Third, the most important, it generates a prototype time-dependent trajectory for an object, which characterizes how features of the object could be changed to achieve a different time to an event. The trajectory can be viewed as a type of the counterfactual explanation. The proposed model is robust during training and inference due to a specific weighting scheme incorporating into the variational autoencoder. The model also determines the censored indicators of new generated data by solving a classification task. The paper demonstrates the efficiency and properties of the proposed model using numerical experiments on synthetic and real datasets. The code of the algorithm implementing the proposed model is publicly available.

Generating Survival Interpretable Trajectories and Data

TL;DR

The paper tackles time-to-event prediction under censoring by proposing a Wasserstein variational autoencoder that jointly learns survival trajectories and data generation. It uses the Beran estimator within a robust, end-to-end framework to produce and , while also yielding a prototype trajectory that encodes how features should change to alter the event time. Key contributions include (i) an encoder–decoder VAE with a Bayes-weighted trajectory in latent space, (ii) a data-generation pipeline for new survival instances with censoring indicators determined by a classifier, and (iii) extensive numerical validation on synthetic and real datasets showing competitive predictive performance and interpretable trajectories. The approach advances survival data generation and explainable trajectories, with practical impact for data augmentation, counterfactual explanations, and risk prediction, and it opens avenues for integrating alternative survival models beyond Beran.

Abstract

A new model for generating survival trajectories and data based on applying an autoencoder of a specific structure is proposed. It solves three tasks. First, it provides predictions in the form of the expected event time and the survival function for a new generated feature vector on the basis of the Beran estimator. Second, the model generates additional data based on a given training set that would supplement the original dataset. Third, the most important, it generates a prototype time-dependent trajectory for an object, which characterizes how features of the object could be changed to achieve a different time to an event. The trajectory can be viewed as a type of the counterfactual explanation. The proposed model is robust during training and inference due to a specific weighting scheme incorporating into the variational autoencoder. The model also determines the censored indicators of new generated data by solving a classification task. The paper demonstrates the efficiency and properties of the proposed model using numerical experiments on synthetic and real datasets. The code of the algorithm implementing the proposed model is publicly available.
Paper Structure (19 sections, 29 equations, 21 figures, 1 table)

This paper contains 19 sections, 29 equations, 21 figures, 1 table.

Figures (21)

  • Figure 1: A scheme of the proposed model
  • Figure 2: The original set $\mathcal{A}_{r}$ of vectors $\mathbf{x}_{i}$ and the set $\mathcal{D}_{m}$ of vectors $\widetilde{\mathbf{z}}_{1},...,\widetilde{\mathbf{z}}_{m}$ normally distributed around vector $\mathbf{z}$
  • Figure 3: Illustration of generated points $\widehat{\mathbf{x}}$ for the “ linear” dataset
  • Figure 4: Generated points $(\widehat{\mathbf{x}},T_{gen})$ for the “ linear” dataset
  • Figure 5: Generated trajectories for the “ linear” dataset
  • ...and 16 more figures